R2 - Multiple Buckets or One?

We are making a storage service which will allow users to store potentially millions of images and files, and we are using Cloudflare R2 as the database. Question: Should we create a bucket for each user, or have one "master" bucket which has everything? API requests are handled by workers so security shouldn't be an issue. Thanks!
28 Replies
AbdallahShaban
AbdallahShaban8mo ago
Hey @Raymond - how are you handling security with the workers? Are all of the users in that system part of the same "tenent"?
Raymond
RaymondOP8mo ago
It's a image sharing service, all images are public, so we only secure the write access via supabase (auth through worker)
AbdallahShaban
AbdallahShaban8mo ago
I would assume you also want to handle use cases for users wanting to delete/update their images - how are you handling access controls to that?
Chaika
Chaika8mo ago
worth keeping in mind too each r2 bucket is backed by a durable object for metadata with a limited amount of ops/s. They're not normal DOs, but can only do a couple hundred writes per second per bucket
Raymond
RaymondOP8mo ago
I don't think that should be a limiting factor, but good to know. Since r2dev (subdomain) requests are rate-limited, if we went with having a unique bucket for each user, wouldn't that mean we should assign a subdomain to each bucket too?
Chaika
Chaika8mo ago
either that or presigned links
Raymond
RaymondOP8mo ago
Would that mean a new dns record for each bucket, or is there a wildcard option?
Chaika
Chaika8mo ago
dns record for each bucket, and you need to create them through the r2 custom domain tab, which iirc the bucket custom domain api still sadly undocumented
Raymond
RaymondOP8mo ago
Dang, thats definitely not an option then Same idea, worker controls access to delete. It's either write or delete, no need to update anything.
AbdallahShaban
AbdallahShaban8mo ago
and where do you keep information on which users have access to either write or delete? do you have a mapping of that somewhere that you have created?
Raymond
RaymondOP8mo ago
Yeah in supabase
AbdallahShaban
AbdallahShaban8mo ago
got it - so your workflow is: 1. a user uploads a file 2. You create an entry into the Database on supabase of their id and the object they uploaded to R2 3. When another user tries to delete the object, you check in the CF workers if they have access or the right permission. Is that about right? I am building a solution to make this a bit easier in the future for developers - so trying to learn a bit more about how developers set things up right now.
Raymond
RaymondOP8mo ago
Pretty much! It’s not set in stone and lots of testing would be needed, but that’s the plan as of now
AbdallahShaban
AbdallahShaban8mo ago
Yea - I've been talking to lots of developers and it seems to be a common pattern - so I am trying to build a solution that is a bit more standardized to ease that heavy lifting. Would you be open to chatting about it? I can DM you
Raymond
RaymondOP8mo ago
Absolutely!
Sam
Sam8mo ago
Any idea if this will change in the future (the amount of requests)? Or is this something for becoming an Enterprise user?
Chaika
Chaika8mo ago
it's a limit of the technology https://discord.com/channels/595317990191398933/940663374377783388/1200109005863911574 Durable Objects are single-threaded V8 Isolates, which is what a bucket is backed by (not for storage of course, just metadata) can only handle so many requests on a single thread before it just becomes overloaded
Sam
Sam8mo ago
Oh mmhh, what about sippy? Is that able to handle more? Is it also going through DOs Upload to S3 -> access via R2
Chaika
Chaika8mo ago
the bucket itself, has a Durable Object assigned to just it, that needs to be in the flow of every operation
Sam
Sam8mo ago
So this is also the case for reads?
Chaika
Chaika8mo ago
yes but there's more caching/they have a way higher limit is my understanding
Sam
Sam8mo ago
Damn so that’s the reason we’re getting so many errors
Chaika
Chaika8mo ago
there's no hard limit or anything, and I think it's just that PUTs (Espec multipart) are way more expensive then reads. Espec if you have an R2 Custom domain w/ caching on it
Sam
Sam8mo ago
We’re at 300 million served images (via custom domain) per month Is there anything about this to read in the documentation? Would like to get more information about it
Chaika
Chaika8mo ago
not documented, most of the info on it is in that conv: https://discord.com/channels/595317990191398933/940663374377783388/1200109005863911574 like Erisa says, not a hard limit or anything purposeful, just limit of the technology at the moment if you have cache/decent hit rate you'd probably be fine
Sam
Sam8mo ago
Most of it is cached yeah, but something images just dont load because of an error We have about 1k reads and writes per minute, off peak. Non cached^ But thanks, will read the chat!
Chaika
Chaika8mo ago
I would imagine the reads would be ok as long as they weren't too diverse, writes might be a problem if you have a ton at once, but again we're talking about like ~400 PUTs per second, not per minute what errors were you getting?
Sam
Sam8mo ago
Ah mmmhh. Yeah writes are mostly OK, a lot of users are experiencing that the images just don’t load. Sadly not on my pc, so can’t sent you the specific logs. Thanks for your time tho
Want results from more Discord servers?
Add your server