H
Hono•7d ago
Icy

How to handle file uploads

There's a very brief documentation on file upload, however it doesn't cover how to store files on the disk. Of course I could use native Node modules, like fs however my biggest problem is that it's all done in memory. I've also noticed that when using req.parseBody, if there are multiple large size files being uploaded, the server response time for other endpoints increases drastically. So what would be the correct way to handle file uploads, while avoiding storing files in the memory?
29 Replies
Arjix
Arjix•7d ago
I doubt you can avoid storing them in-memory. Also, to my knowledge, hono doesn't interact with the file system in any way, for example serveStatic is from another library outside of the core hono package.
ambergristle
ambergristle•7d ago
to clarify, is your goal to minimize the memory (and maybe cpu) footprint of the upload process? streaming is a common option it won't eliminate memory usage entirely, of course, but it should be more performant from a memory perspective
Icy
IcyOP•7d ago
Yes, store a file in a temporary storage, validate and move. Just like multer does, however multer isn't compatible with hono, so I wonder if there's some other package or does it need to be written manually.
ambergristle
ambergristle•7d ago
hm. it seems like you have two goals, which may be mutually-exclusive - parse files from incoming FormData + validate them - minimize the memory footprint of parsing + validating incoming files
Icy
IcyOP•7d ago
yup, spot on
ambergristle
ambergristle•7d ago
afaik, there isn't a way to significantly minimize memory usage if you're using FormData libraries like multer (busboy) or formidable may help a bit by allowing you to process files one at a time, rather than loading them all into memory + then validating is that what you're exploring?
Icy
IcyOP•7d ago
Ok, so just to clarify, uploading multiple large files (like 500mb) in Hono using parseBody will increase the memory usage to 3gb+. On the other hand, when using multer in express or nest, memory usage will not exceed 80mb. Also, when Hono is hit will multiple files from several requests, it will significantly slow down responses of other API endpoints. Think like from 5ms to 300ms. That doesn't happen in nest. That's the problem I'm facing and wanted to figure out.
ambergristle
ambergristle•7d ago
gotcha. that scans and you say you haven't been able to get multer to work with hono? i haven't worked w that lib specifically, but i'd generally expect it to be possible to integrate most express middleware w hono
Arjix
Arjix•6d ago
heck, you can integrate hono in express (very hacky)
Arjix
Arjix•6d ago
it seems you only need to re-implement this middleware https://github.com/expressjs/multer/blob/master/lib/make-middleware.js
GitHub
multer/lib/make-middleware.js at master · expressjs/multer
Node.js middleware for handling multipart/form-data. - expressjs/multer
Psi
Psi•5d ago
Do you really need to use multipart formdata? Streaming a body is much simpler
ambergristle
ambergristle•5d ago
sometimes you don't have control over the client, or it's not cost-effective to refactor immediately. there may be other requirements/constraints as well. i'd agree that streaming seems like the best solution to the performance problem but you're right to ask. would definitely help narrow down the right move
Psi
Psi•5d ago
// controller
const bodyStream = c.req.raw.body;
let readableStream = Readable.fromWeb(bodyStream);
const fileMetadata = await fileStorage.upload(metadata, readableStream);

// fileStorage.uploade
const writeStream = createWriteStream(targetFilePath);
await pipeline(readableStream, writeStream);
// controller
const bodyStream = c.req.raw.body;
let readableStream = Readable.fromWeb(bodyStream);
const fileMetadata = await fileStorage.upload(metadata, readableStream);

// fileStorage.uploade
const writeStream = createWriteStream(targetFilePath);
await pipeline(readableStream, writeStream);
Thats stripped down what I do currently.
Psi
Psi•5d ago
There is a hono-upload lib which wraps busboy: https://github.com/ps73/hono-upload Probably this could be used or give you a direction.
GitHub
GitHub - ps73/hono-upload: A memory efficient upload handler for hono.
A memory efficient upload handler for hono. Contribute to ps73/hono-upload development by creating an account on GitHub.
Psi
Psi•5d ago
I've also this package in my bookmarks but never used: https://github.com/mjackson/remix-the-web/tree/main/packages/form-data-parser Pro: build on standards and does not depend node-streams afaik
GitHub
remix-the-web/packages/form-data-parser at main · mjackson/remix-th...
Open source tools for Remix (or any framework!). Contribute to mjackson/remix-the-web development by creating an account on GitHub.
Icy
IcyOP•5d ago
Well, besides the files themselves, I need to also send other (text) information in the request, so that's why I used multipart. If there's a better way, I'm down to use that. As for the requirements, nothing is currently set in ground, I'm mostly just researching. Psi, in your code example above, what does the createWriteStream call? Is it some library or custom function? And pipeline too. Sorry I don't have much experience with what your're doing there.
Psi
Psi•5d ago
createWriteStream is from "node:fs" and pipeline from "stream/promises" Currently I do async file uploads, means, I have several types of files in my form and when the user clicks "save" I will first upload each file as body-stream. My backend responds with a uuid representing the file. I assoicate than the uuid to the current Item and store the item I'm partly happy with this approach. Good user experience but not so easy to implement
Icy
IcyOP•5d ago
so you do 2 requests from client? how would you handle a case where you wouldn't receive complete data in the 2nd request, after file was uploaded?
Psi
Psi•5d ago
I've a garbage collection which removes files which do not have references even with multipart uploads you need something to remove partly uploaded files
Icy
IcyOP•5d ago
alright guys thx for your help
Psi
Psi•5d ago
Pls share if you coming up with a nice multipart solution 😉
Arjix
Arjix•5d ago
I assume s3 would make that a lot better
Arjix
Arjix•5d ago
GitHub
GitHub - minio/minio: MinIO is a high-performance, S3 compatible ob...
MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license. - minio/minio
ambergristle
ambergristle•5d ago
+1 for streaming files and uploading form data separately. it makes it easier to do stuff like show % upload progress, or handle file issues in my experience, aws producs only ever make things worse in the long run, but i'm a fang hater to the core
Arjix
Arjix•5d ago
I've never used s3 myself, but I've seen it used a lot for file uploads, and since minio can be self-hosted, I thought it was a good alternative
ambergristle
ambergristle•5d ago
no doubt. i'm just being salty, lol
Psi
Psi•5d ago
If you want a object store depends very on your infrastructure and not so much on your business logic. Or if you need to scale out
Arjix
Arjix•5d ago
having dealt with code written by others, I have stopped thinking about "infra" and started thinking about how to escape the mess of the others even though I've never used an object storage before, it does sound appealing bc it is one less responsibility for your code I like inventing the wheel all the time, but in a team scenario that is hard
Psi
Psi•5d ago
I'm a DevOps guy and you want to use ObjectStorage when someone manages it for you cause he thinks its better choice for the infrastructure 😉 I mostly implement both variants, Disk and S3 but store my metadata in my own DB

Did you find this page helpful?