How does database sharding work

I understand the concept of load balancing where for example Service.myApi redirects the request to a server like s1, s2, etc.myApi or even media servers that balance the media into mulitple servers like media.myapi -> m3.myApi But in the case of the media server the data where a media file is stored, is stored in the database so it's just a matter of asking the database for such data But then how can I know what database is targeted when there is db.myapi which would have db1.myApi and db2.myApi since I can't just ask the database for data without knowing where it's stored
38 Replies
Pobiega
Pobiega2mo ago
The "sharded" databases I've worked with all rely on the concept of a partition key all items with the same partition key are guaranteed to be on the same shard, and you must know the partition key to be able to query the database
Somgör from Human Resources
What exactly is this "partition key"
Pobiega
Pobiega2mo ago
essentially just a column like any other, except it must have a known value My most recent example would be for a car manufacturer, where we used a sharded database to store all the cars they ever produced we used the market the car was produced for as the partitionkey there that was using AWS DynamoDB, for the record analysing your access patterns and knowing what data you will have is super important when working with document databases or something like this
Somgör from Human Resources
Let's say I want to load a bunch of messages from a database I'd make an API call to my server which then fetches the data from the database to return it Now if I understand your approach correctly I need a collection of where things are but how will I know that if that requires a database call too? Because that approach sounds like I'm just cloning a huge record of everything just less detailed onto every server which would destroy the porpuse of having multiple servers (in terms of storage)
Pobiega
Pobiega2mo ago
What database engine are we talking here? Implementing shard on your own in something like MSSQL is quite different from doing it in a database that has it from scratch If we assume you are doing it on your own in a relational engine like MSSQL or Postgres, you would have a "master" database that keeps track of what other databases contain what, on an overarching level. Something like "Customer 1246234 has its data in database-6" you should not split/shard/partition data willy nilly, if they belong together, keep them together
Somgör from Human Resources
Our options are PSQL or MariaDB But I tend to go with MariaDB cause I find it more flexible to use and query to
Pobiega
Pobiega2mo ago
so relational dbs not something you would normally partition
Somgör from Human Resources
My plan is that each "community" has its own database with tables and stuff, on the user's account there is a record of all communities they are in But that doesn't help that messages might end up with a huge size
Pobiega
Pobiega2mo ago
Okay, so you're doing multi-tenancy
Somgör from Human Resources
My biggest concern is storage really We have dedicated servers for media and stuff but messages are :SCgetoutofmyhead:
Pobiega
Pobiega2mo ago
¯\_(ツ)_/¯ No good ideas from me, I've never worked at quadrillion-items scale a single relational DB with a single table can easily handle billions of messages as long as you have indexes on the things you will query on
Somgör from Human Resources
Things like message history will be fetched by getting the last few messages along with a token that I can query to load the previous messages
Pobiega
Pobiega2mo ago
thats easy enough, if you have some kind of incremental id so the database can easily "order" the items, thats just a SELECT * FROM messages limit 10;
Somgör from Human Resources
Also what do you think of the idea that the database manager for the backend software is a seperate package the backend uses This way we can keep the database connector stuff as a seperate "service" for easier maintaining
Pobiega
Pobiega2mo ago
im not sure what that means
Somgör from Human Resources
Uh Basically the code that handles database queries is a NuGet package the backend calls
Pobiega
Pobiega2mo ago
are you doing manual ADO.NET queries in 2024? 😛
Somgör from Human Resources
I have no clue what ADO.NET is :SCgetoutofmyhead:
Pobiega
Pobiega2mo ago
SqlConnection and friends
Pobiega
Pobiega2mo ago
I'd uh... recommend not doing that.
Somgör from Human Resources
What's a more modern approach
Pobiega
Pobiega2mo ago
EF Core, or Linq2Db
Somgör from Human Resources
I make the SQL queries and then map the result to objects using Dapper.
Pobiega
Pobiega2mo ago
Right
Somgör from Human Resources
Which of those do you personally recommend I should take a look at primarily
Pobiega
Pobiega2mo ago
Dapper is good, and a few years ago that was the best way to get good performance EF Core probably its the defacto standard for all .NET development
Pobiega
Pobiega2mo ago
the nice thing with it is that you no longer need to write sql
Somgör from Human Resources
Can you give me an example of how a SELECT query would look
Pobiega
Pobiega2mo ago
you just do var messages = await _dbContext.Messages.Where(x => x.SenderId == myUserId).TakeAsync(50);
Pobiega
Pobiega2mo ago
you work with a type called IQueryable<T> that represents the query before it being sent to the database so you can gradually build up a query with more and more filtering or mapping etc when you are done and want to issue the command, you run a finalizer like ToListAsync or SingleAsync etc and then the query runs and you get a List<T> or T back depending on what finalizer you ran
Somgör from Human Resources
1. Does T in this case represent the object I want to map 2. Will this also be safe from SQL injection and such
Pobiega
Pobiega2mo ago
1: T would be your domain model. 2: yes.
Somgör from Human Resources
:soPortuguese: @Pobiega question As I said before every community gets it's own database Now if I want to query all users for example I'd do something like SELECT * FROM myCommunityDb.myUsersTable; Is it possible to query like that on EF Core
Pobiega
Pobiega2mo ago
yes, but its not trivial. look up "multi tenancy in EF Core"
Somgör from Human Resources
Oki Thanks a lot! I'll take a look and come back some time later