Hono•2mo ago

How to handle file uploads

There's a very brief documentation on file upload, however it doesn't cover how to store files on the disk. Of course I could use native Node modules, like fs however my biggest problem is that it's all done in memory. I've also noticed that when using req.parseBody, if there are multiple large size files being uploaded, the server response time for other endpoints increases drastically. So what would be the correct way to handle file uploads, while avoiding storing files in the memory?

29 Replies

Arjix•2mo ago

I doubt you can avoid storing them in-memory. Also, to my knowledge, hono doesn't interact with the file system in any way, for example serveStatic is from another library outside of the core hono package.

ambergristle•2mo ago

to clarify, is your goal to minimize the memory (and maybe cpu) footprint of the upload process? streaming is a common option it won't eliminate memory usage entirely, of course, but it should be more performant from a memory perspective

IcyOP•2mo ago

Yes, store a file in a temporary storage, validate and move. Just like multer does, however multer isn't compatible with hono, so I wonder if there's some other package or does it need to be written manually.

ambergristle•2mo ago

hm. it seems like you have two goals, which may be mutually-exclusive - parse files from incoming FormData + validate them - minimize the memory footprint of parsing + validating incoming files

IcyOP•2mo ago

yup, spot on

ambergristle•2mo ago

afaik, there isn't a way to significantly minimize memory usage if you're using FormData libraries like multer (busboy) or formidable may help a bit by allowing you to process files one at a time, rather than loading them all into memory + then validating is that what you're exploring?

IcyOP•2mo ago

Ok, so just to clarify, uploading multiple large files (like 500mb) in Hono using parseBody will increase the memory usage to 3gb+. On the other hand, when using multer in express or nest, memory usage will not exceed 80mb. Also, when Hono is hit will multiple files from several requests, it will significantly slow down responses of other API endpoints. Think like from 5ms to 300ms. That doesn't happen in nest. That's the problem I'm facing and wanted to figure out.

ambergristle•2mo ago

gotcha. that scans and you say you haven't been able to get multer to work with hono? i haven't worked w that lib specifically, but i'd generally expect it to be possible to integrate most express middleware w hono

Arjix•2mo ago

heck, you can integrate hono in express (very hacky)

Arjix•2mo ago

it seems you only need to re-implement this middleware https://github.com/expressjs/multer/blob/master/lib/make-middleware.js

GitHub

multer/lib/make-middleware.js at master · expressjs/multer

Node.js middleware for handling multipart/form-data. - expressjs/multer

Psi•2mo ago

Do you really need to use multipart formdata? Streaming a body is much simpler

ambergristle•2mo ago

sometimes you don't have control over the client, or it's not cost-effective to refactor immediately. there may be other requirements/constraints as well. i'd agree that streaming seems like the best solution to the performance problem but you're right to ask. would definitely help narrow down the right move

Psi•2mo ago

// controller
const bodyStream = c.req.raw.body;
let readableStream = Readable.fromWeb(bodyStream);
const fileMetadata = await fileStorage.upload(metadata, readableStream);

// fileStorage.uploade
const writeStream = createWriteStream(targetFilePath);
await pipeline(readableStream, writeStream);

// controller
const bodyStream = c.req.raw.body;
let readableStream = Readable.fromWeb(bodyStream);
const fileMetadata = await fileStorage.upload(metadata, readableStream);

// fileStorage.uploade
const writeStream = createWriteStream(targetFilePath);
await pipeline(readableStream, writeStream);

Thats stripped down what I do currently.

Psi•2mo ago

There is a hono-upload lib which wraps busboy: https://github.com/ps73/hono-upload Probably this could be used or give you a direction.

GitHub

GitHub - ps73/hono-upload: A memory efficient upload handler for hono.

A memory efficient upload handler for hono. Contribute to ps73/hono-upload development by creating an account on GitHub.

Psi•2mo ago

I've also this package in my bookmarks but never used: https://github.com/mjackson/remix-the-web/tree/main/packages/form-data-parser Pro: build on standards and does not depend node-streams afaik

GitHub

remix-the-web/packages/form-data-parser at main · mjackson/remix-th...

Open source tools for Remix (or any framework!). Contribute to mjackson/remix-the-web development by creating an account on GitHub.

IcyOP•2mo ago

Well, besides the files themselves, I need to also send other (text) information in the request, so that's why I used multipart. If there's a better way, I'm down to use that. As for the requirements, nothing is currently set in ground, I'm mostly just researching. Psi, in your code example above, what does the createWriteStream call? Is it some library or custom function? And pipeline too. Sorry I don't have much experience with what your're doing there.

Psi•2mo ago

createWriteStream is from "node:fs" and pipeline from "stream/promises" Currently I do async file uploads, means, I have several types of files in my form and when the user clicks "save" I will first upload each file as body-stream. My backend responds with a uuid representing the file. I assoicate than the uuid to the current Item and store the item I'm partly happy with this approach. Good user experience but not so easy to implement

IcyOP•2mo ago

so you do 2 requests from client? how would you handle a case where you wouldn't receive complete data in the 2nd request, after file was uploaded?

Psi•2mo ago

I've a garbage collection which removes files which do not have references even with multipart uploads you need something to remove partly uploaded files

IcyOP•2mo ago

alright guys thx for your help

Psi•2mo ago

Pls share if you coming up with a nice multipart solution 😉

Arjix•2mo ago

I assume s3 would make that a lot better

Arjix•2mo ago

like minio https://github.com/minio/minio

GitHub

GitHub - minio/minio: MinIO is a high-performance, S3 compatible ob...

MinIO is a high-performance, S3 compatible object store, open sourced under GNU AGPLv3 license. - minio/minio

ambergristle•2mo ago

+1 for streaming files and uploading form data separately. it makes it easier to do stuff like show % upload progress, or handle file issues in my experience, aws producs only ever make things worse in the long run, but i'm a fang hater to the core

Arjix•2mo ago

I've never used s3 myself, but I've seen it used a lot for file uploads, and since minio can be self-hosted, I thought it was a good alternative

ambergristle•2mo ago

no doubt. i'm just being salty, lol

Psi•2mo ago

If you want a object store depends very on your infrastructure and not so much on your business logic. Or if you need to scale out

Arjix•2mo ago

having dealt with code written by others, I have stopped thinking about "infra" and started thinking about how to escape the mess of the others even though I've never used an object storage before, it does sound appealing bc it is one less responsibility for your code I like inventing the wheel all the time, but in a team scenario that is hard

Psi•2mo ago

I'm a DevOps guy and you want to use ObjectStorage when someone manages it for you cause he thinks its better choice for the infrastructure 😉 I mostly implement both variants, Disk and S3 but store my metadata in my own DB

Gaming

Programming

How to handle file uploads

Did you find this page helpful?