C#•2y ago

❔ Optimizing EF query

I'm learning ASP.NET and C# and playing around with a "Spreadsheet" application. I have the following models:

- Spreadsheet
- Column
- Row
- Cell

- Spreadsheet
- Column
- Row
- Cell

The short of it is that a Cell belongs to a Row and a Column. The Cell also contains a string Value and a method GetValueAsType which looks up the Column.Type and tries to parse the string into the correct value and returns it. When I'm querying all my Rows and Includeing the Cells, I get an Object reference error when I try to Cell.GetValueAsType, I assume because the Column isn't in memory to reference. I could ThenInclude the Column when getting the Cells, but it makes the query extremely inefficient because I'm fetching the same few columns from the database over and over for every cell. Anyway here's my question:

21 Replies

schwartzmjOP•2y ago

        await _dbContext.Columns
            .Where(c => c.SpreadsheetId == spreadsheetId)
            .ToListAsync();

        await _dbContext.Columns
            .Where(c => c.SpreadsheetId == spreadsheetId)
            .ToListAsync();

I found that if I just fetch the Spreadsheet's columns before performing my Row query, I do not get the object reference error. This seems like a hacky workaround. Is it? I tried a bunch of ways to include the Columns in my Rows query but I wasn't sure how to get it to work properly (I kept converting the query in a Column or Cell query, or I'd lose my pagination based on rows (it'd paginate based on columns or cells) , etc)

        await _dbContext.Columns
            .Where(c => c.SpreadsheetId == spreadsheetId)
            .ToListAsync();
        
        var rowsQuery = _dbContext.Rows
            .Where(r => r.SpreadsheetId == spreadsheetId)
            .Include(r => r.Cells)
            .Skip(startRow)
            .Take(endRow - startRow);

        await _dbContext.Columns
            .Where(c => c.SpreadsheetId == spreadsheetId)
            .ToListAsync();
        
        var rowsQuery = _dbContext.Rows
            .Where(r => r.SpreadsheetId == spreadsheetId)
            .Include(r => r.Cells)
            .Skip(startRow)
            .Take(endRow - startRow);

That's where I'm at right now So for example if I try:

        var rowsQuery = _dbContext.Spreadsheets.Where(s => s.Id == spreadsheetId)
            .Include(s=>s.Columns)
            .Include(s=>s.Rows)
            .ThenInclude(r=>r.Cells)
            .SelectMany(s => s.Rows)
            .OrderBy(r => r.Id)
            .Skip(startRow)
            .Take(endRow - startRow);

        var rowsQuery = _dbContext.Spreadsheets.Where(s => s.Id == spreadsheetId)
            .Include(s=>s.Columns)
            .Include(s=>s.Rows)
            .ThenInclude(r=>r.Cells)
            .SelectMany(s => s.Rows)
            .OrderBy(r => r.Id)
            .Skip(startRow)
            .Take(endRow - startRow);

It looks like I convert them and paginate Rows correctly, but start getting the object reference error again I know this is long winded, but I suppose to summarize: I do not want to get the Column for every single Cell. I want to just get the Columns one time up front so that I can call the Cell.GetValueAsType() which needs the Column in memory to know what type the column is

JakenVeina•2y ago

so, we're saying that

_context.Spreadsheets.Rows
    .Include(row => row.Cells)
    .ThenInclude(cell => cell.Column)

_context.Spreadsheets.Rows
    .Include(row => row.Cells)
    .ThenInclude(cell => cell.Column)

is inefficient? I'm guessing it's doing N+1 querying for the columns? have you verified this via logs, or are you just assuming? like, your end goal here is that you want one EF query that will translate to

SELECT *
FROM ROWS
WHERE ...

SELECT *
FROM COLUMNS
WHERE ...

SELECT *
FROM CELLS
WHERE ...

SELECT *
FROM ROWS
WHERE ...

SELECT *
FROM COLUMNS
WHERE ...

SELECT *
FROM CELLS
WHERE ...

(order notwithstanding) and then everything gets collated together in .NET yes?

Jimmacle•2y ago

i would expect it to just join, the query is dead simple my question is why does it sound like you're using a database to create a database? if all cells in a column have the same type that's just a database table

JakenVeina•2y ago

uhh that's not how JOIN works in EF Core .Include() does not translate to a JOIN for one-to-many relationships it translates to 2 queries one for the parents, one for the children and collates them in .NET

Jimmacle•2y ago

i don't have logs handy but that doesn't sound like the queries i get for what i'm doing at work could be thinking of something else, it's friday evening and my brain is done for the week

JakenVeina•2y ago

fair enough lemme see if I can confirm alright, let's see if I remember how to use the ef designer tools damn, probably need to start with re-installing postgres alright, so fixed is correct at least for some scenarios maybe it's a versioning difference too, I've been out of the loop on EF Core since 5.0 or maybe it depends on schema complexity but for the following schema

public class MyParentEntity
{
    public required long Id { get; init; }
    public required string Name { get; set; }
    public ICollection<MyChildEntity>? Children { get; init; }
}

public class MyChildEntity
{
    public required long Id { get; init; }
    public required long ParentId { get; set; }
    public MyParentEntity? Parent { get; set; }
    public required string Name { get; set; }
}

public class MyParentEntity
{
    public required long Id { get; init; }
    public required string Name { get; set; }
    public ICollection<MyChildEntity>? Children { get; init; }
}

public class MyChildEntity
{
    public required long Id { get; init; }
    public required long ParentId { get; set; }
    public MyParentEntity? Parent { get; set; }
    public required string Name { get; set; }
}

this query

var parents = await dbContext.Set<MyParentEntity>()
    .Include(parent => parent.Children)
    .ToListAsync();

var parents = await dbContext.Set<MyParentEntity>()
    .Include(parent => parent.Children)
    .ToListAsync();

compiles to

SELECT p."Id", p."Name", c."Id", c."Name", c."ParentId"
FROM "Parents" AS p
LEFT JOIN "Children" AS c ON p."Id" = c."ParentId"
ORDER BY p."Id"

SELECT p."Id", p."Name", c."Id", c."Name", c."ParentId"
FROM "Parents" AS p
LEFT JOIN "Children" AS c ON p."Id" = c."ParentId"
ORDER BY p."Id"

and the inverse query

var children = await dbContext.Set<MyChildEntity>()
    .Include(child => child.Parent)
    .ToListAsync();

var children = await dbContext.Set<MyChildEntity>()
    .Include(child => child.Parent)
    .ToListAsync();

compiles to

SELECT c."Id", c."Name", c."ParentId", p."Id", p."Name"
FROM "Children" AS c
INNER JOIN "Parents" AS p ON c."ParentId" = p."Id"

SELECT c."Id", c."Name", c."ParentId", p."Id", p."Name"
FROM "Children" AS c
INNER JOIN "Parents" AS p ON c."ParentId" = p."Id"

so, bottom line what is your issue @schwartzmj ?

schwartzmjOP•2y ago

Yes, the query is 350ms by doing .Include(cell => cell.Column) and ~16ms if I load the columns upfront and do not do the include it's more like the user is creating their own columns of their own types. but yes i had the same thought about "using a database to create a database" 😆

Jimmacle•2y ago

i mean, if that's the goal of the application you should ditch EF and use something like dapper because you don't have a well defined (as in, entirely user defined) model

schwartzmjOP•2y ago

I can certainly look into that. I'm very green to C# and of course ASP.NET

Jimmacle•2y ago

dapper is essentially raw SQL queries with some minimal mapping functionality while EF gets upset easily if you don't follow its conventions

JakenVeina•2y ago

sounds like what you might want instead is a projection if you wanna stick with EF you don't need to pull the ENTIRE column entity, just the one Type value for each cell if your query is just compiling to one giant JOINed SQL query, then the difference in time is probably just all the duplicated data limit that to just the Type column with a projection

schwartzmjOP•2y ago

the type is on the Column, so what i want to do is just grab the columns upfront and ignore them on the Cell level. i guess my current method of "get columns, ignore columns when getting rows/cells" works, it just seemed odd that it did work

JakenVeina•2y ago

it's working because you're doing those all as tracking queries so all those column entities are being tracked in memory when a Cell comes in, EF recognizes that it has the Column for that in memory already and fills it in, even if you didn't explicitly request it on that query

schwartzmjOP•2y ago

ok cool. that's what i assumed and what i half-confirmed in my head reading some docs. just didnt want some bug or edge case to pop up. also was very curious on other ways of doing it I'm not at my work computer right now, but later I plan on reading through these comments more in depth. I very much appreciate the help given so far

JakenVeina•2y ago

also was very curious on other ways of doing it

projected queries

schwartzmjOP•2y ago

have some of the docs up for that right now 🙂

Saber•2y ago

unless you do AsSplitQuery() it will translate into joins

JakenVeina•2y ago

well, there you go that could be an easy fix force EF to do the multiple-queries approach, and eliminate redundant data in the transfer do you mean the opposite of what you said? cause testing seems to indicate the default behavior is to use JOINs and the name .AsSplitQuery() would suggest to me that it's going to... split the query docs seem to support this or okay, I see what you mean

Saber•2y ago

does adding a comma after AsSplitQuery() clear up the message? Yes it will not do joins adding it in

JakenVeina•2y ago

I misread that a bit yeah

Accord•2y ago

Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.

Gaming

Programming

❔ Optimizing EF query

Did you find this page helpful?