Drizzle and multi-tenancy
I've been given a project that requires a degree of multi-tenancy. The system database will contain users, organizations, sub-organizations and all the links between these entities, then all tenant-specific entities would reside in the separate database. The client is asking for a separate database per tenant for two reasons:
1. to prevent data leakage between tenants
2. to allow tenants to pick their region
To me, 1 isn't much of an issue as it can be alleviated with testing queries and rigorous testing in a staging environment.
2 is a little more complicated, and I'm asking for further clarification on if this is a legal thing or not.
Personally, I'm mostly concerned about the prospect of needing to run migrations on dozens of identical but separate databases. While it looks like it can be done rather easily with Drizzle by just looping over the tenants and building a new instance of the client, I've done multi-database multi-tenancy before and it was a bit of a pain, not something I would like to repeat.
Right now I'm considering using Turso as it has a high limit on database count and databases can be created programmatically via their API.
On the other hand, just using a single MySQL/Postgres instance, switching databases via the
USE
statement, and keeping everything within SQL sounds easier than having another layer via the Turso API.
I'm also unsure how Turso would handle constant re-creations of the Drizzle client every time separate tenants require a connection, instead of maintaining the same one connection.
Any recommendations on approach or if I'm missing anything I should be considering?2 Replies
I feel hesitant to respond here because of my short experience in multi-tenancy. But my two cents might help you a little bit. I believe there are many questions to answer before you can consider selecting the database/database provider to use. Will it run in a dedicated server, do you need to optimize for cold starts, what type of requests will be the most commons, how often a new tenant will be onboarded, what type of resources will you be working with.
After that, changing databases within backend server environment will come to be a question of, does your driver support it?
I feel like I will be easier/faster to have a separate drizzle instance per database, I'm not sure how bad the system resources will be demanded but this way you'll have a connection pool ready per request per tenant
Will it run in a dedicated serverthey seem set on using serverless, so the app would be deployed on vercel
do you need to optimize for cold startsperformance is not a big concern here, so i don't believe so
how often a new tenant will be onboardedcurrent estimation is a few dozen for 2024, scaling to low hundreds per year
what type of resources will you be working withdynamic form templates, filled in forms, calculations based on those forms
what type of requests will be the most commonson the system side: just loading what organizations and sub-organizations a user is linked to on the tenant side: the resources mentioned above, nothing super complicated that sql can't breeze through i got confirmation that tenants need to be able to pick their region for legal reasons, so the options for db here is basically - use Turso, the list of locations they offer have been acceptable - spin up new database servers manually and build a service to coordinate tenant onboarding between all these different servers cloudflare d1 looks like a solid turso alternative but not being able to explicitly choose a region makes it difficult to justify when the region selection is a legal requirement
After that, changing databases within backend server environment will come to be a question of, does your driver support it?in theory, starting with sqlite and switching to mysql or pg is only gaining features. imo the driver type isn't the complicated part here, it's the whole hosting of the database server and creating new databases programmatically in different regions that makes it complicated. using turso allows one auth key to connect to X number of database URLs, compared to spinning up mysql/pg servers myself and building a service to coordinate all of these, potentially requiring another API layer to interact with those databases, is just far more complex
I feel like I will be easier/faster to have a separate drizzle instance per databaseagreed. i did just remember i've done a sort of cached connection in my pg side project that could work here, just instead of returning a single instance it has a map of connections and returns the applicable one passed into it, or something like that thanks for the reply, sorry for the wall of text lol