Topics

C#•11mo ago

The Fog from Human Resources

How does database sharding work

I understand the concept of load balancing where for example Service.myApi redirects the request to a server like s1, s2, etc.myApi or even media servers that balance the media into mulitple servers like media.myapi -> m3.myApi But in the case of the media server the data where a media file is stored, is stored in the database so it's just a matter of asking the database for such data But then how can I know what database is targeted when there is db.myapi which would have db1.myApi and db2.myApi since I can't just ask the database for data without knowing where it's stored

38 Replies

Pobiega•11mo ago

The "sharded" databases I've worked with all rely on the concept of a partition key all items with the same partition key are guaranteed to be on the same shard, and you must know the partition key to be able to query the database

The Fog from Human ResourcesOP•11mo ago

What exactly is this "partition key"

Pobiega•11mo ago

essentially just a column like any other, except it must have a known value My most recent example would be for a car manufacturer, where we used a sharded database to store all the cars they ever produced we used the market the car was produced for as the partitionkey there that was using AWS DynamoDB, for the record analysing your access patterns and knowing what data you will have is super important when working with document databases or something like this

The Fog from Human ResourcesOP•11mo ago

Let's say I want to load a bunch of messages from a database I'd make an API call to my server which then fetches the data from the database to return it Now if I understand your approach correctly I need a collection of where things are but how will I know that if that requires a database call too? Because that approach sounds like I'm just cloning a huge record of everything just less detailed onto every server which would destroy the porpuse of having multiple servers (in terms of storage)

Pobiega•11mo ago

What database engine are we talking here? Implementing shard on your own in something like MSSQL is quite different from doing it in a database that has it from scratch If we assume you are doing it on your own in a relational engine like MSSQL or Postgres, you would have a "master" database that keeps track of what other databases contain what, on an overarching level. Something like "Customer 1246234 has its data in database-6" you should not split/shard/partition data willy nilly, if they belong together, keep them together

The Fog from Human ResourcesOP•11mo ago

Our options are PSQL or MariaDB But I tend to go with MariaDB cause I find it more flexible to use and query to

Pobiega•11mo ago

so relational dbs not something you would normally partition

The Fog from Human ResourcesOP•11mo ago

My plan is that each "community" has its own database with tables and stuff, on the user's account there is a record of all communities they are in But that doesn't help that messages might end up with a huge size

Pobiega•11mo ago

Okay, so you're doing multi-tenancy

The Fog from Human ResourcesOP•11mo ago

My biggest concern is storage really We have dedicated servers for media and stuff but messages are :SCgetoutofmyhead:

Pobiega•11mo ago

¯\_(ツ)_/¯ No good ideas from me, I've never worked at quadrillion-items scale a single relational DB with a single table can easily handle billions of messages as long as you have indexes on the things you will query on

The Fog from Human ResourcesOP•11mo ago

Things like message history will be fetched by getting the last few messages along with a token that I can query to load the previous messages

Pobiega•11mo ago

thats easy enough, if you have some kind of incremental id so the database can easily "order" the items, thats just a SELECT * FROM messages limit 10;

The Fog from Human ResourcesOP•11mo ago

Also what do you think of the idea that the database manager for the backend software is a seperate package the backend uses This way we can keep the database connector stuff as a seperate "service" for easier maintaining

Pobiega•11mo ago

im not sure what that means

The Fog from Human ResourcesOP•11mo ago

Uh Basically the code that handles database queries is a NuGet package the backend calls

Pobiega•11mo ago

are you doing manual ADO.NET queries in 2024? 😛

The Fog from Human ResourcesOP•11mo ago

I have no clue what ADO.NET is :SCgetoutofmyhead:

Pobiega•11mo ago

SqlConnection and friends

The Fog from Human ResourcesOP•11mo ago

Ah Yes.

Pobiega•11mo ago

I'd uh... recommend not doing that.

The Fog from Human ResourcesOP•11mo ago

What's a more modern approach

Pobiega•11mo ago

EF Core, or Linq2Db

The Fog from Human ResourcesOP•11mo ago

I make the SQL queries and then map the result to objects using Dapper.

Pobiega•11mo ago

Right

The Fog from Human ResourcesOP•11mo ago

Which of those do you personally recommend I should take a look at primarily

Pobiega•11mo ago

Dapper is good, and a few years ago that was the best way to get good performance EF Core probably its the defacto standard for all .NET development

The Fog from Human ResourcesOP•11mo ago

Oki

Pobiega•11mo ago

the nice thing with it is that you no longer need to write sql

The Fog from Human ResourcesOP•11mo ago

Can you give me an example of how a SELECT query would look

Pobiega•11mo ago

you just do var messages = await _dbContext.Messages.Where(x => x.SenderId == myUserId).TakeAsync(50);

The Fog from Human ResourcesOP•11mo ago

Ah

Pobiega•11mo ago

you work with a type called IQueryable<T> that represents the query before it being sent to the database so you can gradually build up a query with more and more filtering or mapping etc when you are done and want to issue the command, you run a finalizer like ToListAsync or SingleAsync etc and then the query runs and you get a List<T> or T back depending on what finalizer you ran

The Fog from Human ResourcesOP•11mo ago

1. Does T in this case represent the object I want to map 2. Will this also be safe from SQL injection and such

Pobiega•11mo ago

1: T would be your domain model. 2: yes.

The Fog from Human ResourcesOP•11mo ago

:soPortuguese: @Pobiega question As I said before every community gets it's own database Now if I want to query all users for example I'd do something like SELECT * FROM myCommunityDb.myUsersTable; Is it possible to query like that on EF Core

Pobiega•11mo ago

yes, but its not trivial. look up "multi tenancy in EF Core"

The Fog from Human ResourcesOP•11mo ago

Oki Thanks a lot! I'll take a look and come back some time later

We are a programming server aimed at coders discussing everything related to C# (CSharp) and .NET.

56KMembers

View on Discord

Did you find this page helpful?