Help with concurrency blocking issue
I have a small app that runs CPU bound on multiple threads (amount of processors)
On one PC it works very well, reaching ~800 iterations/second
But on my other PC (Which is actually far superior in all of it's components), it averages on ~200 iterations..
I attached a concurrency visualizer analysis, as you can see it's almost 50% stuck in sync (locks)
2nd image shows a zoom in with a blocking thread which blocks all other threads. Seems to be something regarding the GC.
Analysing memory I could not see any weird signs there.. any and all help would be greatly appreciated.
16 Replies
Ephemeral GC (gen 0 and gen 1) pauses all threads.
If you have a lot of such collections happening it could be impacting you.
That would imply your code is doing a lot of memory allocations.
Is that only true (on workstation GC) if a gen 2 collection is already occurring in the background?
UPDATE: OK so obviously I rubber ducked this Discord
Using GC.TryStartNoGCRegion with a large enough allocation solved the problem
Well, gen 2 is different.
Gen0 and Gen1 always pause all threads as far as I know.
Gen2 can do background collections on separate dedicated threads.
TryStartNoGCRegion
:tense:
I was trying to confirm that, but all I found so far was confirmation that it pauses all threads during a 'foreground' G0 or G1 collection (meaning a background G2 is already occuring)
The GC docs feel a bit inconsistent. They have contemporary verbiage indicating some features have replaced others and then out-of band warnings claiming some things don't apply if you are using NET FX. Its really hard to tell what applies in a bog standard .NET 5.0+ project by default
lol yeah I agree but it's a dedicated slave to do bursts of multithreaded map reduction
You might look at the code and see if you can allocate less. Sounds like you're doing a lot of small allocations.
Yeah it's very complex code I inherited, so unfortunately this is the most cost effective workaround
I pooled a lot of the smaller allocations where it wasn't deadly scary.
uncovered legacy code 🥲
btw, if I wanted to profile this to see where the issue is; which profiler should I use?
I usually use PerfView
The 'fundamentals' GC page mentions this offhand. I bet on CPU bound and allocation heavy code, that the threshold before its worth partitioning data to IPC to child processes to handle is probably a lot lower than most devs expect (assuming the inputs and outputs are reasonable to serialize).
The classic advice in performance talks for .NET is "Objects you allocate should either be extremely short lived or live for the entire lifetime of your process." Techniques like pre-allocating everything on startup and using pooling are one way to avoid these kinds of GC pause issues. Easier said than done.
@mtreit I agree, I just meant that the pauses increase in relative expense with the number of active threads you have. There is some threshold (in allocation rate and active threads) where it costs less to split the work across processes...and I bet its a lower threshold than we would want to know about; probably at a level that most of us would claim its a premature optimization or too annoying/complicated to support in a hand-waivey way.
Luckily I'm primarily a web / business / app dev, and 'all cores and CPU bound' work is pretty rare.
I'm working on a codebase that is CPU bound for pretty much the first time in my career - most of my previous work was all I/O bound (network calls, database calls.)
It's an interesting change.
I wish I could get onto a project like that. I can't even get MS to review my job applications.