refactor database to stop using UNIQUEIDENTIFIER as PK
So I didn't know enough of databases to know that it is not a good idea to use uniqueidentifiers as PK in MSSQL database. Now that I have a lot of records my queries are starting to become slow. I'm using Entity Framework.
What steps should I take to migrate my database to INT PK bearing in mind all the relationships between tables that already exists? I can't loose the relationships. I would still have a column like, "referenceKey" that I would use to access specific records in my database so I can prevent the guessing PK attack. But having INT as PK would allow me to create more efficient indices and whatnot, right?
Has anyone ever done such a thing? How should I proceed?
34 Replies
What makes you think using integer instead of guid is going to be faster?
It can be faster. But have you done any analysis to determine it will be faster?
What is a lot of records?
Also, how are they generated? By the database or by the code?
GUID's are generated by the database when record is inserted. I do not generate them by the code. For example, I have this table with 303 records. Table name is GenericFields. Query is taking 25 seconds to complete. IIRC, I'm running RDS in small or micro instance, it's the smallest one. Table has 303 records.
Here is the EF query log
data:image/s3,"s3://crabby-images/e88d0/e88d0dfa64877968a20821e020ef29e03acf3ea0" alt="No description"
303 is not a lot of records, so no clue why it took so long.
Oh, that has nothing to do with PKs then
I'd guess a non-indexed column, but even so.
I have to go for now though, sorry.
No problem. GPT told me to do something about indexes like this
data:image/s3,"s3://crabby-images/7a8de/7a8dec5b8467428340c94ca759e29e794a1ba922" alt="No description"
but I'm not sure. I replaced another paginated query to use 100% dapper and I added "Covering Indexes"
and it still take 4seconds to query
Query: SELECT Id, StatusEnum, Descricao, StartDate, EndDate, CreationTime, ReportType FROM RelatoriosPdf WHERE CompanyId = @CompanyId
ORDER BY CreationTime OFFSET @Offset ROWS
FETCH NEXT @PageSize ROWS ONLY
Index:
CREATE INDEX IX_RelatoriosPdf_CompanyId_CreationTime
ON RelatoriosPdf (CompanyId, CreationTime)
INCLUDE (Id, StatusEnum, Descricao, StartDate, EndDate, ReportType);
it seems to not have speed up anything
In execution planner, it seems to search only the index.
data:image/s3,"s3://crabby-images/b7c97/b7c972e7cab2654bd132169da202f14c7b853824" alt="No description"
this is another query that I tried to optmize. In SSMS query runs in less than one second, but when called from frontend and everything else, takes more than 4segs
and I'm not really doing anything fancy in C# to add such time
except using Mapster to map the sql result to DTO and give it back to frontend.
the sql performance seems to have degraded over time. One time when I refactored all the queries, it wasn't taking 3ms to run them all. Now the same batch takes more than 20secs to run. Maybe if I restart RDS time will go back to normal. Maybe the RDS instance is too small? This is driving me crazy with all sorts of problems in production. Any help would he heavily appreciated
these are other queries unrelated to the ones I posted here, but it's funny how one time they go fast and all of a sudden it starts to become slow
How wide are the tables?
I go with @Yawnder - [▰/▱] your db is probably missing indexing here
Check the execution on db side
Analyze an actual execution plan - SQL Server
Learn how to analyze actual graphical execution plans, which contain runtime information, by using SQL Server Management Studio Plan Analysis feature.
When it takes long on the „index seek“ part you are missing an index
They only have 303 rows though, I agree indexes are good for read, but something else is going on
https://discord.com/channels/143867839282020352/1338019303382843432/1338032815220199465
do i spy eav?
Leandro
GUID's are generated by the database when record is inserted. I do not generate them by the code. For example, I have this table with 303 records. Table name is GenericFields. Query is taking 25 seconds to complete. IIRC, I'm running RDS in small or micro instance, it's the smallest one. Table has 303 records.
Quoted by
<@347434717022257162> from #refactor database to stop using UNIQUEIDENTIFIER as PK (click here)
React with ❌ to remove this embed.
sorry, I dont understand
Eav is a database design tool anti pattern, looking at the columns in your query, you have one called content, and your table is called GenericFields
Medium
Entity Attribute Value Model
Today, we are going to talk about a Database design model, known as the Entity Attribute Value model aka EAV model. EAV Database models are…
Are you doing something like this?
Regardless, circling back to your initial question, I have doubts that restriping your tables to users int is going to give you much gain. It seems like you've got other problems potentially going on.
I dont think so, content column is a json string
oic
that avoids the need for eav tbh
I think Hangfire was degrading performance. After I disabled it, some queries are running faster now. Maybe I'll look for a different solution.
SQL Profiler was very active before I disabled
yeah, it was definately hangfire. I make like 15 queries to gather data to build a PDF report. Before it could take as long as 20secs. Now with Hangffire disabled, everything ran in less than 2 seconds.
Well, I was already warned that this lib is harsh on database. Maybe some obscure configuration to make it less of a bottleneck.. Will have to study more
Maybe use mass transit?
I was thinking a out lambda and event bridges? But I will check this for sure
it still needs Hangfire. Sad
data:image/s3,"s3://crabby-images/c8053/c805304b69ec2411efdc81fdf7a3692bce9cb543" alt="No description"
How many jobs do you have?
I'm inferring just one?
a couple, 5 or 6 . They don't run everytime
but still it completely disgraced my database xD
maybe there is some obscure pooling settings in hangfire or something..
but I'm considering just removing this crazy lib and try to do with lambdas and event bridges
Can you use another db for hangfire or switch to quartz?
unfortunately I cannot, my budget is very low. Quartz still needs a db right? It might be worth a shot to try but I'm inclined to think that a solution with 0 database usage would be better
No
You can use a database
You don't have to
You do however need to use somethig for permanent storage.
There's a memory provider iirc
I don't know what the budget here is
Memory storage only works in the simplest cases