How to handle file uploads
There's a very brief documentation on file upload, however it doesn't cover how to store files on the disk. Of course I could use native Node modules, like
fs
however my biggest problem is that it's all done in memory. I've also noticed that when using req.parseBody
, if there are multiple large size files being uploaded, the server response time for other endpoints increases drastically. So what would be the correct way to handle file uploads, while avoiding storing files in the memory?29 Replies
I doubt you can avoid storing them in-memory.
Also, to my knowledge, hono doesn't interact with the file system in any way, for example
serveStatic
is from another library outside of the core hono package.to clarify, is your goal to minimize the memory (and maybe cpu) footprint of the upload process?
streaming is a common option
it won't eliminate memory usage entirely, of course, but it should be more performant from a memory perspective
Yes, store a file in a temporary storage, validate and move. Just like multer does, however multer isn't compatible with hono, so I wonder if there's some other package or does it need to be written manually.
hm. it seems like you have two goals, which may be mutually-exclusive
- parse files from incoming
FormData
+ validate them
- minimize the memory footprint of parsing + validating incoming filesyup, spot on
afaik, there isn't a way to significantly minimize memory usage if you're using
FormData
libraries like multer
(busboy
) or formidable
may help a bit by allowing you to process files one at a time, rather than loading them all into memory + then validating
is that what you're exploring?Ok, so just to clarify, uploading multiple large files (like 500mb) in Hono using
parseBody
will increase the memory usage to 3gb+. On the other hand, when using multer in express or nest, memory usage will not exceed 80mb. Also, when Hono is hit will multiple files from several requests, it will significantly slow down responses of other API endpoints. Think like from 5ms to 300ms. That doesn't happen in nest. That's the problem I'm facing and wanted to figure out.gotcha. that scans
and you say you haven't been able to get
multer
to work with hono?
i haven't worked w that lib specifically, but i'd generally expect it to be possible to integrate most express middleware w honoheck, you can integrate hono in express (very hacky)
it seems you only need to re-implement this middleware
https://github.com/expressjs/multer/blob/master/lib/make-middleware.js
GitHub
multer/lib/make-middleware.js at master · expressjs/multer
Node.js middleware for handling
multipart/form-data
. - expressjs/multerDo you really need to use multipart formdata?
Streaming a body is much simpler
sometimes you don't have control over the client, or it's not cost-effective to refactor immediately. there may be other requirements/constraints as well. i'd agree that streaming seems like the best solution to the performance problem
but you're right to ask. would definitely help narrow down the right move
Thats stripped down what I do currently.
There is a hono-upload lib which wraps busboy: https://github.com/ps73/hono-upload
Probably this could be used or give you a direction.
GitHub
GitHub - ps73/hono-upload: A memory efficient upload handler for hono.
A memory efficient upload handler for hono. Contribute to ps73/hono-upload development by creating an account on GitHub.
I've also this package in my bookmarks but never used: https://github.com/mjackson/remix-the-web/tree/main/packages/form-data-parser
Pro: build on standards and does not depend node-streams afaik
GitHub
remix-the-web/packages/form-data-parser at main · mjackson/remix-th...
Open source tools for Remix (or any framework!). Contribute to mjackson/remix-the-web development by creating an account on GitHub.
Well, besides the files themselves, I need to also send other (text) information in the request, so that's why I used multipart. If there's a better way, I'm down to use that. As for the requirements, nothing is currently set in ground, I'm mostly just researching.
Psi, in your code example above, what does the
createWriteStream
call? Is it some library or custom function? And pipeline
too. Sorry I don't have much experience with what your're doing there.createWriteStream is from "node:fs" and pipeline from "stream/promises"
Currently I do async file uploads, means, I have several types of files in my form and when the user clicks "save" I will first upload each file as body-stream. My backend responds with a uuid representing the file. I assoicate than the uuid to the current Item and store the item
I'm partly happy with this approach. Good user experience but not so easy to implement
so you do 2 requests from client? how would you handle a case where you wouldn't receive complete data in the 2nd request, after file was uploaded?
I've a garbage collection which removes files which do not have references
even with multipart uploads you need something to remove partly uploaded files
alright guys thx for your help
Pls share if you coming up with a nice multipart solution 😉
I assume s3 would make that a lot better
like minio
https://github.com/minio/minio
GitHub
GitHub - minio/minio: MinIO is a high-performance, S3 compatible ob...
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license. - minio/minio
+1 for streaming files and uploading form data separately. it makes it easier to do stuff like show % upload progress, or handle file issues
in my experience, aws producs only ever make things worse in the long run, but i'm a fang hater to the core
I've never used s3 myself, but I've seen it used a lot for file uploads, and since minio can be self-hosted, I thought it was a good alternative
no doubt. i'm just being salty, lol
If you want a object store depends very on your infrastructure and not so much on your business logic.
Or if you need to scale out
having dealt with code written by others, I have stopped thinking about "infra" and started thinking about how to escape the mess of the others
even though I've never used an object storage before, it does sound appealing bc it is one less responsibility for your code
I like inventing the wheel all the time, but in a team scenario that is hard
I'm a DevOps guy and you want to use ObjectStorage when someone manages it for you cause he thinks its better choice for the infrastructure 😉
I mostly implement both variants, Disk and S3 but store my metadata in my own DB