❔ EFCore Sqlite Async: RAM and CPU Problem
So my code does the following:
30 Targets are being processed at once via async operations. Each second for each target I create 20 new Children of the target. So per second I have 20 * 30 new children. Each 20 children I call the "SaveAsync" method.
However before I add a child to the database I want to ensure that the child does not exist yet, if it exists already I update its properties.
Im doing it with this code. I am assuming that EFCore loads all the children objects into memory, that leads to very high memory exposure, how can I prevent this? It should operate on Database level each time and not load all objects in RAM.
It is important because I have millions on child object.
Second issue is the CPU issue. Without doing the database actions, such as SaveAsync, FirstOrDefaultAsync the CPU is around 1% (normal for my Task), however with those methods after around 2mins I have a CPU over 90%.
249 Replies
No, EF does not load the whole database into memory
It generates an SQL query with a
WHERE
and LIMIT 1
in this caseHmmm whats the issue with the memory leaks then? Without DB usage I have consistently around 400
and with it just goes up and up
I can send all operations that I do on Database side
Well basically I only do these two that I just said
Why not try profiling it to see which objects are doing the damage
Huh, well, the only way I can see it being an issue is if maybe EF cannot translate that
.Equals()
You should just take a look at what query gets generated
so this code looks fine at first glance?
Yep
so the DBSet "Children" is not loaded into memory?
Could probably use
ExecuteUpdateAsync()
instead, but this is valid
Nope, it should not be loaded unless you explicitly load it allI only do that + SaveChangesAsync after each 20 new children that I add to the DBSet
nothing else
so whats my best approach to solve this ?
Well, at a glance, there's... nothing to solve, everything seems fine
Unless you're doing something really stupid like
new
ing up a new DbContext with every loopI dont actually, I have 30 DbContexts as otherwise I get an exception for concurrency.
Per async Task that generates new childs per target one BotContext
How do you get those contexts?
or what do you mean?
Yeah, that seems fine-ish
_db is path to sqlite
I thought you might be not disposing of them, hence the memory issues
Huh
I mean they stay alive, these tasks that I was speaking of do not exit for hours
so this instance stays alive for hours
30 instances
I'm out of ideas, I'm afraid
Try asking in #database maybe? Link this thread there, provide a short summary
Ok thanks for trying to help.
if you dont dispose the dbcontext then each added Child/User will continue to be tracked. meaning memory usage will increase
same for the updated user, since you seem to be tracking entities by default
so its possible, with your current approach, to load the entire database into memory multiple times (as many times as you have semi persistent dbcontext instances)
the solution is to dispose the dbcontext earlier
makes sense
I will defo try that
what about the CPU usage? Might it be because I save too frequently?
30 Tasks save it each 20 new children
so basically each second it saves 30 times
or isnt that a problem
saving and writing to database in itself isnt cpu intentive, its IO intensive
but your queries can become expensive, given enough data to process
e.g. by having no or outdated indezes
but then I guess I have the issue that I create them too frequently. Like I would have a new context per task each second, what is like suggested?
dbcontext is pretty cheap to create
Or a lot of indexes...
so its fine if I have it in a loop but dispose after each iteration
it kinda depends, this seems to enter the realm of mass data processing, which is fine for ef core, it becomes troublesome, when you have to update stuff frequently
consider using executeupdate
or switch to something else like linq2db, that supports merge statements
the updates are very very rare
its just about the amount of data that is a lot
i will try now with the new approach for the context
to see if it fixed both issues
my personal threshold is 1k tracked entities per dbcontext, i will try to use a new one, when i surpass that limit
ok got it
because 1k-2k entities can be processed by ef core with ms sql server in a single savechanges call in a reasonable timeframe
its less for sqlite iirc
this is not a limitation by ef core, but the dbdriver only supporting a limited amount of query parameters per query
sadly didnt solve it like CPU gets to 40% after 5mins
and ram still explodes
its probably time to post some actual code, not just the query bit
is vc possible too or no? because i cant actually share the code
sadly not
This is the critical code that does DB operations
in parameterless ScrapeAsync method there is max 30 tasks happening
the dbcontext isnt being disposed there
in the method I mentioned?
the code, you posted, yes
the one in parameterless scrapeasync needs to be disposed after getting the targets from it?
I thought its not as critical because from that context I just get the targets, nothing else. Its not used for anything else
change tracking works for every entity
unless you disable it
change tracking is whats causing your increasing memory
how do i disable it?
also is this better?
not sure if you meant that
you need change tracking for updates, inserts and deletes
so you cant disable it
bruh
but you can reset the changetracker by creating a new dbcontext
or calling a method on the changetracker itself, but thats slower in my experience than just creating a new dbcontext
am i not doing it properly right now?
creating new db context each time
yes, i dont know what you changed, but nothing relevant seems to have changed in your code, after i explained your problem and why you have it
I moved the using in scrapeasync with parameters
down into the loop
so it does not stay for long
after each iteration its being disposed
now
or im wrong?
you havent shared code, so i cant comment
I have
here?
or did I miss something
I honestly hope you're not making something nasty here ...
i dont see a using there
no no
Seems like a Discord bot that scrapes stuff somewhere, and considering that you're scraping a large amount of data, I'd first like to know a bit more what you're trying to do.
First, your issue is that you're creating the context within the loop.
You should batch your operations.
Context should be created outside the loop and used all the way until SaveChanges.
I had that before
Typically, you wrap your context within the
using
block.people told me to not
they had a bunch of semi persistent instances before
no you did not
i had the contexts before the while(true) loop before
the using statements
Also, the fact that you're using SQLite could be part of your issues.
Especially if there's multiple instances of your app/bot accessing same data/database.
batching with ef core assumes, you create a new dbcontext per batch, anything else is pointless
Because SQLite, even though it does support concurrency, it's not meant to be used for large number of connections.
30 is not rlly large ig
30 what?
30 tasks that do DB operations
30 dbcontext instances that stay around
As long as that amounts to a single DB connection, it's not an issue.
Single file
yeah
You're missing the point.
ok tell me
Number of concurrent connections on your database.
which is another issue, but not too relevant atm
single
sqlite file db does not care about the amount of connections you use, because its single threaded internally anyway
you can have 100 connections to it, but only one will write at any given time
Confused right now. I got told that I should expose the contexts more frequently and not let them open for hours, right? I closed them at method end but the method ends within hours/days
yeah thought so
Creating the context should be outside the while loop here.
Ok
the issue still there, I tested with both
after 2-3 mins
the CPU is going nuts
and RAM is just increasing and increasing
in first 1-2 minutes CPU is 1-5%
Without the DB operations, it stays at 1-5%
but never goes up
There can be a lot of reasons for both CPU and RAM usage to keep going up.
yeah but its DB Operations
since without them
no issues
i still suspect the change tracker being at fault. i dont see how keeping the dbcontext around will help with that
Because you're batchiung stuff and it's kind of normal to keep increasing RAM usage until a context is actually disposed.
What amount of RAM are we talking about?
the amount just increases and increases, never finds an end.
ok lets forget RAM for now
but what about CPU
Can you tell me what amount of RAM are we talking about, Megabytes, Gigabytes, Terabytes?
ok so without DB Operations it was max 400 Megabyte, never increases as GC did its job.
With DB Operations after 3mins I was at 800MB and it was just constantly increasing
That's really not a lot.
yeah but I plan to run the program 24/7 and since it constantly goes up it will fill the RAM within a few hours ocmpletely
As long as that gets disposed after SaveChanges, that's fine.
its not
I call savechanges after each iteration in loop
thats the issue
Well, you don't know that, you're only guessing based on current behavior, which is not a lot of RAM.
in the while true
Just because you dispose an object doesn't mean it gets cleared from ram
yeah im assuming it
It will be cleared when GC kicks in
dispose is usually just for unmanaged memory
that the GC cant collect
Not always...
Yeah, that's part of the problem. You can call SaveChanges after the while loop and EF will batch-process everything at once.
there is no after
its a
while(true)
I mean there is
it exits after hours
via
break
hours / daysThen use a counter and call SaveChanges after certain number of tasks / whatever have been processed.
after certain number of users added?
if it actually exits, then i say, let it run, until you confirmed memory is an issue
Yeah, I agree with Insire.
ok I can do that for the RAM problem. The CPU problem is like much more of a problem for me
You have to test it for a longer time before making an assumption that there is an actual memory leak
it takes after 5mins 90%+
Then profiling your code is the way to go.
You have to figure out exactly what does that.
Perhaps that is an expected behavior, since you're continuously running the whole thing.
So it's going to run as fast as it can, which means it will use as much of available processing power it has.
im pretty sure you can write this a lot more efficient with ef core
im new to EF so no clue tbh
I freestyled this
And using the
.Equals()
is totally wrong here.
Use a normal ==
comparison.
Unless you're comparing complex objects.doesnt matter if == or .Equals. I compare two strings
does that actually matter for ef core?
== is overriden in string
does equals check
Yes, because it will make it run client-side and will not fully translate to SQL, unless EF Core is clever enough to figure that out.
Always use
==
unless there's a very specific reason to use .Equals()
.hm, i'd expect for ef core to generate a warning during runtime for that
Yeah, I would expect the same, but this is SQLite, perhaps the provider does something to mitigate this particular case.
i'll have to check that at work some time
.Equals()
is typically used to compare complex objects, not simple data types like strings.sure
Yes, you can override it to do whatever you want, but that's not the point here.
What exactly do you mean by this?
string type overrides == operator
So?
why would it matter if == or not
same operation
What matters is the
.Equals()
. EF Core may not be able to properly translate it to SQL expression that gets ran on the SQLite.ah yeah im dumb
its no evaluating on C# side but generates queries
changed it now 👍🏻
Exactly.
It's called "server-side vs client-side evaluation"
Speaking of connection lifetime,
since this is SQLite, you don't really have to worry about it.
unless its the inmemory variant, then you do
i will do tests and let you guys know the results
This is one simple way to batch 100 tasks at a time.
GC should keep up over time.
You may see an increased memory usage, but it should not go beyond certain threshold.
There are ways to for GC to run more frequently or even at specific times.
There's also a project directive which changes this behavior.
once the entire db is in memory, the memory usage increase will stop :P
You could also not use the change tracker, and instead of loading the entry, editing it, saving changes... use
.ExecuteUpdateAsync()
Try adding
in your
.csproj
file, in the top ProjectGroup
section.If the EF version used allows
they do inserts aswell, so they are not getting away from the changetracker
and inserts seem to be majority of actions
i will try this
thanks
yeah
should I dispose DbContext too and create a new one?
because currently my contexts are open for hours before disposal
around 30 contexts
Are you saying that you have 30 contexts running in parallel?
And to answer your question, no, you should not dispose of DbContexts, that's why you're wrapping it in
using()
block, to have that done for you.
Best I can say right now - I think you have to rethink your whole workflow.
And before you do that, try to lear a bit more about a few other things
- How SQLite works
- How EF Core DbContext works
- DbContext lifetime
- How things work when using await
/ async
for async things im pretty good at it already.
But yeah for EF im missing exp
i will solve it and post solution here
this is the critical part
even without savechangesasync it does go up to 100% cpu
after a while
Ideally, if you can, share a GitHub repo for your project. That way we can have a better picture of what's going on in your entire app, because, the fact taht you're experiencing this when adding database operations still doesn't mean it's EF Core's issue.
I would but I cannot as its NDA partially
if you are down, im open to VC
thats the part where change tracking happens
yeah. So I excluded basically everything, even savechanges.
its this part that causes trouble
if (existingUser != null)
thats the only line where nothing is trackedhow does it track property change? im curious? Did it inject some kind of CIL Code into the setter method?
The moment you get a reference to
existingUser
it becomes tracked.
Change tracking is the default behavior of EFCore.yeah I got that
but I just want to know how it tracks the changes
internally
not that its important here
not really
For that, you have to read the docs.
just curious
kk
yeah so this is fucked but i dunno how
and why
how can I see sql queries generates from efcore?
maybe its doing dumb stuff
Change Tracking - EF Core
Overview of change tracking for EF Core
ok just tested, even without the if(existingUser != null) case
its still going nuts
so its this
var existingUser = await ctx
.ScrapedUsers
.FirstOrDefaultAsync(x => x.RestID == scrapedUser.RestID);
or
ctx.ScrapedUsers.Add(scrapedUser);
for CPU im guessing first
Add
does nothing particularly useful
all it does it change the entry state to "Added"so its this
var existingUser = await ctx
.ScrapedUsers
.FirstOrDefaultAsync(x => x.RestID == scrapedUser.RestID);
(i mean its still useful, just not important in terms of cpu time)
well, it also needs to check whether that entity is already being tracked
wouldn't account for a 90% cpu jump 😆
all it does it looks at the primary key if its set
no database roundtrips
is that an assumption, or did you actually go and look at the code?
It says it in the docs, give me a minute
its like after 5min
because that explanation does not explain the error ef core throws, when you try to track another entity with the same primary key
in one minute it goes like 3% then 11% then 9% then 3% but after sometime it just explodes
just hotspot profile your application
with a snapshot
i will check, never done profiling tbh
DbContext.Add and DbSet<TEntity>.Add do not normally access the database, since these methods inherently just start tracking entities. However, some forms of value generation may access the database in order to generate a key value. The only value generator that does this and ships with EF Core is HiLoValueGenerator<TValue>. Using this generator is uncommon; it is never configured by default. This means that the vast majority of applications should use Add, and not AddAsync. Other similar methods like Update, Attach, and Remove do not have async overloads because they never generate new key values, and hence never need to access the database.
thats not really an answer to my question. it says start tracking, it does not confirm, whether they only check the primary key for it holding the default value
It doesn't you can easily test it
if you create a new model, set an id to the existing one and "add" then "savechangesasync" it will throw exceptions at you
(at the savechangesasync line)
i know it does. but you claimed otherwise before and i was asking whether that was an assumption, or if you did look at the code
I didn't claim otherwise..? I said it shouldn't account for a 90% cpu jump because it literally doesn't check the database
idk, it sounds like you did
especially the last part does
i meant
it looks inside the class
not inside the database
my bad
i mean sure, db lookups are IO work, not CPU work
it looks if the property that you have set as a key (with fluent api or annotation) has a value or not
anyways
at the end of the day
it doesn't do DB Trips
in general it doesnt, you are right. there are exceptions tho
add just checks inside its change tracker dictionary if there's another type with the same id (this throws an exception), otherwise it marks the object as "Added" and will be inserted at "SaveChangesAsync()"
however
but i digress
if you do this:
if the item and item2 coincidentally end up with the same ID
its gonna throw
yea
having a single context for him isn't a big issue
however yeah ofc the RAM will keep growing
the change-tracker is gonna grow as long as theres new "users" (or whatever) being added
I'd definitely change
ScrapeFollowersFromEntries
to not receive a dbcontext, and instead create one inside
then all I can recommend is hotspot profilingthats what i have been saying, but
¯\_(ツ)_/¯
create and dispose a dbcontext every 1k entries being added
why not every function call?
its still gonna throw exceptions
because while dbcontext creation is cheap, its still not free
at this point if you got that much function calls, just switch to plain SQL with Dapper
i dont think dapper is an alternative to ef core
or execute plainsql with efcore and forget about change-tracker
dapper is a micro orm, not a orm
if you wanted a micro orm, you wouldnt choose ef core
idk man I got like 3 diff opinions on this. Some said its fine to let it just dispose after the loop that takes hours /days
and you say I should dispose it after each 1k
¯\_(ツ)_/¯
there's no answer, you have to test it
it depends on a lot of factors
you get to choose
if you keep 1, you gonna have all the objects in memory
(not instantly, but as you query and add items, they're kept in memory)
all i know is, that i have been doing pretty much exactly what you are doing for the past 1-2 years at work, and i am pretty confident that i know the limits of ef core in this regard by know
its the firstordefault
and if you actually want a replacement for ef core, then you can look at something like linq2db for mass updates and inserts - athough i just recently started using that, so im not an expert there
i love efcore
but this kinda got annoying
do you still have the
equals
or did you change it?changed
var existingUser = ctx
.ScrapedUsers
.FirstOrDefault(x => x.RestID == scrapedUser.RestID);
otherwise, is your sqlite table optimized?
is the RestID indexed?
can you elaborate?
what does it mean to be indexed
did you add an index to the database column, for that table
in sqlite
EXPLAIN QUERY PLAN SELECT * FROM ScrapedUsers WHERE RestId = ?;
run this manually
with a valid restid
and show the outputsqlite creates the table for me
efcore*
bruh
you can still define indezes
basically if you dont have an index
and you request from an sql an object
it will csan through the whole table
to find what you want
a "index" turns into a hashmap (not exactly)
which is way faster
ah bro
that might be the issue then
yeah you're putting too much pressure on SQLite
this seems fine then now?
yeah
is the
RestID
the primary key for ScrapedUser
?
if so, its already indexedits not actually
I should have done that as primary key
as it should be unique
that's another discussion lol
yeah better to have a separate PK
anyways, just add the index for now just to test if it's going to work
yea, just because a column should only contain unique values, does not mean it has to be a primary key column (although its a strong indicator)
testing rn
after 5min
5% cpu
thats very good
will keep it running
cpu is fixed
now checking RAM
so less of an ef-core issue and more of an database table optimization issue 😆
i'd advice you to read on those topics, as they are very important
😄
also congrats on learning how to profile
there's also profiling for RAM usage.. to see which objects take memory
yeah right, missing exp from my side lol
but guys
im somehow still leaking memory
even though im disposing the ctx after each 10th iteration
what can be the issue?
like after 1hr approx its 5GB
and it increases
Also I get a unique constrained failed on RestID here ?
how is it even possible
do you still have 30 dbcontext instances?
yeah
then how do you make sure that they dont try to insert a new user for the same id?
i need them due to concurrency
otherwise exception
hm fuk
yes
do I need to semaphore lock?
this region
you could, but you'll also kill what ever advtange you gained by having 30 dbcontext instances
i need them otherwise I get exceptions
concurrency exceptions
then maybe have one dbcontext instead
then you dont need locks
but then I have ram issue as I cannot dispose it
?
one at a time
yeah
still even though I made a simple lock here
or, well you could just accept that, and add retry logic
nah im not that dirty
choices 💜
efcore gives me cancer
slowly
thats how it is for any framework, that you dont know
you'll have the same problem with any relational database
and maybe nosql databases too
this seems so illogical to me
xd
ok I understodd the problem I guess. In another context I add the scrapeduser but I didnt save it yet meaning in another context I cannot find the entry as its not in the db and not in the context itself yet
so yeah one context is the way+
imagine having 2 dbcontext instances at the same time, both are asked for the same user id, both say they aint have it
then you try to write both of them to the db
i said, that only one thread writes at any time to the db
so on write succeeds
the other, then fails, because then the user already exists in the db
pretty simple imo
yeah i solved it
this explains it basically
and the statement with the thread, dont take it too seriously. there is still synchronization, but not necessarily on the same thread
not after 18hours work
my brain is on 30%
¯\_(ツ)_/¯
take a break
get some sleep
yeah now since it works
haha
Was this issue resolved? If so, run
/close
- otherwise I will mark this as stale and this post will be archived until there is new activity.