C
C#•10mo ago
Prae

Designing a business automated task and trying to optimize the queries performed to the database

Hi, I'm building an automated task, that read a file everyday, perform operations on every line, and insert rows in database. While performing operations, the naive approche is to iterate over each line and query the database to get the necessary datas I originally wanted to limit the number of query to the database, by grouping the queries necessaries for each row into multiple big queries. Is it a good approach ? What are the limits ? Because now, instead of querying the database on each line and getting the necessary data directly, i need to retrieve in memory the data that i need for the current line, and it feels really bad honestly. My "where" conditions in my queries are not that trivial, and i find myself having to duplicate it in my data access layer and in my task. One idea i had was to somehow tell the database which result row was for which line in my file, so i could only iterate on this line number, it would feel a bit better, but i dont know how Am i overthinking ? I heard premature optimization is the root of all evil 😮 Thank you for your help, i really appreciate it
7 Replies
matter
matter•10mo ago
"it depends" (Standard disclaimer) Could you load the entire file into a staging table, then execute batch operations on that?
FestivalDelGelato
FestivalDelGelato•10mo ago
to help you $itdepends
FestivalDelGelato
FestivalDelGelato•10mo ago
it's kinda hard to help without knowing more of what/how you are querying
Prae
PraeOP•10mo ago
Hello guys, I understand it was hard to give definitive answer. "Could you load the entire file into a staging table, then execute batch operations on that?" => Yes, i load everything in memory and then loop over chunked arrays of 100 rows. For each chunk, i query everything i need for those 100 rows, and then i feed a dictionnary for each query i performed, using as key a value type (so a struct if multiple elements were involved) of what was used to filter the entities in my repository. Then, i loop over the hundred rows and perform my task. I like the simplicity of use of dictionnaries to retrieve the datas I need for the current row, i didnt have to duplicate every conditions. For example i had to search a list of customers based on a name, so in my cache i have a dictionary with the name as key and all the customers i had in db as value. Thanks to you both anyway
FestivalDelGelato
FestivalDelGelato•10mo ago
"Could you load the entire file into a staging table" i feel like table is a wrong term here also how can you determine "100 rows" without any kind of analysis/bechmark the principle of batching - plus eventually some caching - is ok, i've nothing against that
Prae
PraeOP•10mo ago
to be honest the 100 row number was determinated by my technichal lead. It's a setting we'll be ready to adjust after we enter in production if we see any problem in fact every 100 row i perform a savechanges, and we thought it was fine to send insert request for 100 rows at once. It's not that critical so we didnt benchmark more...

Did you find this page helpful?