R2 - Multiple Buckets or One?
We are making a storage service which will allow users to store potentially millions of images and files, and we are using Cloudflare R2 as the database.
Question: Should we create a bucket for each user, or have one "master" bucket which has everything?
API requests are handled by workers so security shouldn't be an issue.
Thanks!
28 Replies
Hey @Raymond - how are you handling security with the workers? Are all of the users in that system part of the same "tenent"?
It's a image sharing service, all images are public, so we only secure the write access via supabase (auth through worker)
I would assume you also want to handle use cases for users wanting to delete/update their images - how are you handling access controls to that?
worth keeping in mind too each r2 bucket is backed by a durable object for metadata with a limited amount of ops/s. They're not normal DOs, but can only do a couple hundred writes per second per bucket
I don't think that should be a limiting factor, but good to know. Since r2dev (subdomain) requests are rate-limited, if we went with having a unique bucket for each user, wouldn't that mean we should assign a subdomain to each bucket too?
either that or presigned links
Would that mean a new dns record for each bucket, or is there a wildcard option?
dns record for each bucket, and you need to create them through the r2 custom domain tab, which iirc the bucket custom domain api still sadly undocumented
Dang, thats definitely not an option then
Same idea, worker controls access to delete. It's either write or delete, no need to update anything.
and where do you keep information on which users have access to either write or delete? do you have a mapping of that somewhere that you have created?
Yeah in supabase
got it - so your workflow is:
1. a user uploads a file
2. You create an entry into the Database on supabase of their id and the object they uploaded to R2
3. When another user tries to delete the object, you check in the CF workers if they have access or the right permission.
Is that about right? I am building a solution to make this a bit easier in the future for developers - so trying to learn a bit more about how developers set things up right now.
Pretty much! It’s not set in stone and lots of testing would be needed, but that’s the plan as of now
Yea - I've been talking to lots of developers and it seems to be a common pattern - so I am trying to build a solution that is a bit more standardized to ease that heavy lifting. Would you be open to chatting about it? I can DM you
Absolutely!
Any idea if this will change in the future (the amount of requests)? Or is this something for becoming an Enterprise user?
it's a limit of the technology
https://discord.com/channels/595317990191398933/940663374377783388/1200109005863911574
Durable Objects are single-threaded V8 Isolates, which is what a bucket is backed by (not for storage of course, just metadata)
can only handle so many requests on a single thread before it just becomes overloaded
Oh mmhh, what about sippy? Is that able to handle more? Is it also going through DOs
Upload to S3 -> access via R2
the bucket itself, has a Durable Object assigned to just it, that needs to be in the flow of every operation
So this is also the case for reads?
yes but there's more caching/they have a way higher limit is my understanding
Damn so that’s the reason we’re getting so many errors
there's no hard limit or anything, and I think it's just that PUTs (Espec multipart) are way more expensive then reads. Espec if you have an R2 Custom domain w/ caching on it
We’re at 300 million served images (via custom domain) per month
Is there anything about this to read in the documentation? Would like to get more information about it
not documented, most of the info on it is in that conv: https://discord.com/channels/595317990191398933/940663374377783388/1200109005863911574
like Erisa says, not a hard limit or anything purposeful, just limit of the technology at the moment
if you have cache/decent hit rate you'd probably be fine
Most of it is cached yeah, but something images just dont load because of an error
We have about 1k reads and writes per minute, off peak.
Non cached^
But thanks, will read the chat!
I would imagine the reads would be ok as long as they weren't too diverse, writes might be a problem if you have a ton at once, but again we're talking about like ~400 PUTs per second, not per minute
what errors were you getting?
Ah mmmhh. Yeah writes are mostly OK, a lot of users are experiencing that the images just don’t load. Sadly not on my pc, so can’t sent you the specific logs. Thanks for your time tho