C
C#2y ago
schwartzmj

❔ Optimizing EF query

I'm learning ASP.NET and C# and playing around with a "Spreadsheet" application. I have the following models:
- Spreadsheet
- Column
- Row
- Cell
- Spreadsheet
- Column
- Row
- Cell
The short of it is that a Cell belongs to a Row and a Column. The Cell also contains a string Value and a method GetValueAsType which looks up the Column.Type and tries to parse the string into the correct value and returns it. When I'm querying all my Rows and Includeing the Cells, I get an Object reference error when I try to Cell.GetValueAsType, I assume because the Column isn't in memory to reference. I could ThenInclude the Column when getting the Cells, but it makes the query extremely inefficient because I'm fetching the same few columns from the database over and over for every cell. Anyway here's my question:
21 Replies
schwartzmj
schwartzmjOP2y ago
await _dbContext.Columns
.Where(c => c.SpreadsheetId == spreadsheetId)
.ToListAsync();
await _dbContext.Columns
.Where(c => c.SpreadsheetId == spreadsheetId)
.ToListAsync();
I found that if I just fetch the Spreadsheet's columns before performing my Row query, I do not get the object reference error. This seems like a hacky workaround. Is it? I tried a bunch of ways to include the Columns in my Rows query but I wasn't sure how to get it to work properly (I kept converting the query in a Column or Cell query, or I'd lose my pagination based on rows (it'd paginate based on columns or cells) , etc)
await _dbContext.Columns
.Where(c => c.SpreadsheetId == spreadsheetId)
.ToListAsync();

var rowsQuery = _dbContext.Rows
.Where(r => r.SpreadsheetId == spreadsheetId)
.Include(r => r.Cells)
.Skip(startRow)
.Take(endRow - startRow);
await _dbContext.Columns
.Where(c => c.SpreadsheetId == spreadsheetId)
.ToListAsync();

var rowsQuery = _dbContext.Rows
.Where(r => r.SpreadsheetId == spreadsheetId)
.Include(r => r.Cells)
.Skip(startRow)
.Take(endRow - startRow);
That's where I'm at right now So for example if I try:
var rowsQuery = _dbContext.Spreadsheets.Where(s => s.Id == spreadsheetId)
.Include(s=>s.Columns)
.Include(s=>s.Rows)
.ThenInclude(r=>r.Cells)
.SelectMany(s => s.Rows)
.OrderBy(r => r.Id)
.Skip(startRow)
.Take(endRow - startRow);
var rowsQuery = _dbContext.Spreadsheets.Where(s => s.Id == spreadsheetId)
.Include(s=>s.Columns)
.Include(s=>s.Rows)
.ThenInclude(r=>r.Cells)
.SelectMany(s => s.Rows)
.OrderBy(r => r.Id)
.Skip(startRow)
.Take(endRow - startRow);
It looks like I convert them and paginate Rows correctly, but start getting the object reference error again I know this is long winded, but I suppose to summarize: I do not want to get the Column for every single Cell. I want to just get the Columns one time up front so that I can call the Cell.GetValueAsType() which needs the Column in memory to know what type the column is
JakenVeina
JakenVeina2y ago
so, we're saying that
_context.Spreadsheets.Rows
.Include(row => row.Cells)
.ThenInclude(cell => cell.Column)
_context.Spreadsheets.Rows
.Include(row => row.Cells)
.ThenInclude(cell => cell.Column)
is inefficient? I'm guessing it's doing N+1 querying for the columns? have you verified this via logs, or are you just assuming? like, your end goal here is that you want one EF query that will translate to
SELECT *
FROM ROWS
WHERE ...

SELECT *
FROM COLUMNS
WHERE ...

SELECT *
FROM CELLS
WHERE ...
SELECT *
FROM ROWS
WHERE ...

SELECT *
FROM COLUMNS
WHERE ...

SELECT *
FROM CELLS
WHERE ...
(order notwithstanding) and then everything gets collated together in .NET yes?
Jimmacle
Jimmacle2y ago
i would expect it to just join, the query is dead simple my question is why does it sound like you're using a database to create a database? if all cells in a column have the same type that's just a database table
JakenVeina
JakenVeina2y ago
uhh that's not how JOIN works in EF Core .Include() does not translate to a JOIN for one-to-many relationships it translates to 2 queries one for the parents, one for the children and collates them in .NET
Jimmacle
Jimmacle2y ago
i don't have logs handy but that doesn't sound like the queries i get for what i'm doing at work could be thinking of something else, it's friday evening and my brain is done for the week
JakenVeina
JakenVeina2y ago
fair enough lemme see if I can confirm alright, let's see if I remember how to use the ef designer tools damn, probably need to start with re-installing postgres alright, so fixed is correct at least for some scenarios maybe it's a versioning difference too, I've been out of the loop on EF Core since 5.0 or maybe it depends on schema complexity but for the following schema
public class MyParentEntity
{
public required long Id { get; init; }
public required string Name { get; set; }
public ICollection<MyChildEntity>? Children { get; init; }
}

public class MyChildEntity
{
public required long Id { get; init; }
public required long ParentId { get; set; }
public MyParentEntity? Parent { get; set; }
public required string Name { get; set; }
}
public class MyParentEntity
{
public required long Id { get; init; }
public required string Name { get; set; }
public ICollection<MyChildEntity>? Children { get; init; }
}

public class MyChildEntity
{
public required long Id { get; init; }
public required long ParentId { get; set; }
public MyParentEntity? Parent { get; set; }
public required string Name { get; set; }
}
this query
var parents = await dbContext.Set<MyParentEntity>()
.Include(parent => parent.Children)
.ToListAsync();
var parents = await dbContext.Set<MyParentEntity>()
.Include(parent => parent.Children)
.ToListAsync();
compiles to
SELECT p."Id", p."Name", c."Id", c."Name", c."ParentId"
FROM "Parents" AS p
LEFT JOIN "Children" AS c ON p."Id" = c."ParentId"
ORDER BY p."Id"
SELECT p."Id", p."Name", c."Id", c."Name", c."ParentId"
FROM "Parents" AS p
LEFT JOIN "Children" AS c ON p."Id" = c."ParentId"
ORDER BY p."Id"
and the inverse query
var children = await dbContext.Set<MyChildEntity>()
.Include(child => child.Parent)
.ToListAsync();
var children = await dbContext.Set<MyChildEntity>()
.Include(child => child.Parent)
.ToListAsync();
compiles to
SELECT c."Id", c."Name", c."ParentId", p."Id", p."Name"
FROM "Children" AS c
INNER JOIN "Parents" AS p ON c."ParentId" = p."Id"
SELECT c."Id", c."Name", c."ParentId", p."Id", p."Name"
FROM "Children" AS c
INNER JOIN "Parents" AS p ON c."ParentId" = p."Id"
so, bottom line what is your issue @schwartzmj ?
schwartzmj
schwartzmjOP2y ago
Yes, the query is 350ms by doing .Include(cell => cell.Column) and ~16ms if I load the columns upfront and do not do the include it's more like the user is creating their own columns of their own types. but yes i had the same thought about "using a database to create a database" 😆
Jimmacle
Jimmacle2y ago
i mean, if that's the goal of the application you should ditch EF and use something like dapper because you don't have a well defined (as in, entirely user defined) model
schwartzmj
schwartzmjOP2y ago
I can certainly look into that. I'm very green to C# and of course ASP.NET
Jimmacle
Jimmacle2y ago
dapper is essentially raw SQL queries with some minimal mapping functionality while EF gets upset easily if you don't follow its conventions
JakenVeina
JakenVeina2y ago
sounds like what you might want instead is a projection if you wanna stick with EF you don't need to pull the ENTIRE column entity, just the one Type value for each cell if your query is just compiling to one giant JOINed SQL query, then the difference in time is probably just all the duplicated data limit that to just the Type column with a projection
schwartzmj
schwartzmjOP2y ago
the type is on the Column, so what i want to do is just grab the columns upfront and ignore them on the Cell level. i guess my current method of "get columns, ignore columns when getting rows/cells" works, it just seemed odd that it did work
JakenVeina
JakenVeina2y ago
it's working because you're doing those all as tracking queries so all those column entities are being tracked in memory when a Cell comes in, EF recognizes that it has the Column for that in memory already and fills it in, even if you didn't explicitly request it on that query
schwartzmj
schwartzmjOP2y ago
ok cool. that's what i assumed and what i half-confirmed in my head reading some docs. just didnt want some bug or edge case to pop up. also was very curious on other ways of doing it I'm not at my work computer right now, but later I plan on reading through these comments more in depth. I very much appreciate the help given so far
JakenVeina
JakenVeina2y ago
also was very curious on other ways of doing it
projected queries
schwartzmj
schwartzmjOP2y ago
have some of the docs up for that right now 🙂
Saber
Saber2y ago
unless you do AsSplitQuery() it will translate into joins
JakenVeina
JakenVeina2y ago
well, there you go that could be an easy fix force EF to do the multiple-queries approach, and eliminate redundant data in the transfer do you mean the opposite of what you said? cause testing seems to indicate the default behavior is to use JOINs and the name .AsSplitQuery() would suggest to me that it's going to... split the query docs seem to support this or okay, I see what you mean
Saber
Saber2y ago
does adding a comma after AsSplitQuery() clear up the message? Yes it will not do joins adding it in
JakenVeina
JakenVeina2y ago
I misread that a bit yeah
Accord
Accord2y ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.
Want results from more Discord servers?
Add your server