C
C#2y ago
Super

❔ Parallel.ForEach is actually slow

I have a process which I need to run very fast. when running with parallel.foreach (8) (my cpu is 16) it takes 2 minutes. When splitting the data and running it from two seperate EXE processes, it takes 1 minute for both (half time, same data). Anyone knows why? or how can I fix it without running multiple processes? thanks 🙂 Note: changing degree to 16 makes it even slower.
Parallel.ForEach(sfs, new ParallelOptions { MaxDegreeOfParallelism = 8 }, sf =>
{
var success= PerformFromDatabase(sfString);
}
Parallel.ForEach(sfs, new ParallelOptions { MaxDegreeOfParallelism = 8 }, sf =>
{
var success= PerformFromDatabase(sfString);
}
49 Replies
Henkypenky
Henkypenky2y ago
$code
MODiX
MODiX2y ago
To post C# code type the following: ```cs // code here ``` Get an example by typing $codegif in chat If your code is too long, post it to: https://paste.mod.gg/
Henkypenky
Henkypenky2y ago
are u using List<T>?
Super
SuperOP2y ago
List<String> sfs
List<String> sfs
Doombox
Doombox2y ago
is whatever you're doing actually parallelisable? and are you doing enough work enough times to justify the overheads?
Super
SuperOP2y ago
yes.. it something that will take 50 days without I am trying to reduce it to one day
Doombox
Doombox2y ago
generally a better idea to use the TPL though, spin up tasks and let the TPL take care of it for you. Never been a huge fan of Parallel.ForEach, especially when you're doing the sort of heavy lifting you're talking about
Henkypenky
Henkypenky2y ago
are u scanning the file system or something? 50 days? wth, what are u doing?
Super
SuperOP2y ago
it is 4000 database SELECTs over 2000 different WHERE clauses and running ML on each one. (8000000)...
Doombox
Doombox2y ago
TPL will generally give you a solid return assuming you generate sufficient task granularity, and it takes away having to manage the threading yourself
Henkypenky
Henkypenky2y ago
Task.WhenAll should do it
Doombox
Doombox2y ago
why should I? pepetense
Henkypenky
Henkypenky2y ago
as our friend here kekw
Super
SuperOP2y ago
:] thanks I will check those out..
Doombox
Doombox2y ago
you can create a task for your DB stuff, get those all running then yeah just Task.WhenAll, or you can consume tasks as they complete and immediately begin an ML task (depends on how much of the previous tasks need to be completed) maximising the amount of "code" running at any one time on the TPL will get you as much parallelism as is feasible, if it's still too slow then it's generally time for another language
Henkypenky
Henkypenky2y ago
the problem can also be the database no in-memory = slow
Doombox
Doombox2y ago
some also don't allow parallelisation on a single connection but allow a functionally infinite number of connections so worth looking into it
Henkypenky
Henkypenky2y ago
oh yeah the pool thingy that's a good one
Doombox
Doombox2y ago
someone somewhere will have a blog post on how to absolutely torture whatever DB you're using
Henkypenky
Henkypenky2y ago
another thing, if you are using EFCore it has some overhead also, ADO.NET is generally faster at the cost of less functionality, so i don't know maybe start getting some performance gains wherever you can but i don't know what you are using so if u give more details maybe it's easier
Super
SuperOP2y ago
its postgresql protocol on questdb, implementation is done by ml.net (their code actually sends the query through database loader into IDataView)
Doombox
Doombox2y ago
I know sweet FA about ML.NET tbh, all I know is almost nobody uses it so specific implementation stuff is beyond me
Super
SuperOP2y ago
yeah lol no one is using it..
Henkypenky
Henkypenky2y ago
ML.Net is only good (fast) with data prediction, cause it uses the same algorithms as other implementations, that's what i know, i used it for data prediction and it was fast and the results comparable to other ones as for other stuff like image or pattern recognition i have no idea
Super
SuperOP2y ago
Store Sales - Time Series Forecasting
Use machine learning to predict grocery sales
Super
SuperOP2y ago
I am place 96 😉
Henkypenky
Henkypenky2y ago
yeah you might get a little bit more out of ML.net but don't expect much, probably first places use enterprise grade ML and you won't ever get that with ML.NET
Super
SuperOP2y ago
what do you mean by "enterprise grade"
Henkypenky
Henkypenky2y ago
azure machine learning or any public cloud solution they have more power than you probably it's all about resources
Super
SuperOP2y ago
yeah this is why I am parallelizing the shit out of it ;] I thought most of people in kaggle use the kaggle cpus
Doombox
Doombox2y ago
only so much you can do with CPU parallelisation almost all the big gains are on the GPU end
Henkypenky
Henkypenky2y ago
we are talking several servers working concurrently here, not a single cpu parallelized to jesus
Super
SuperOP2y ago
yeah but ml.net isn't using that yeah :\ I understand
Doombox
Doombox2y ago
yeah but ML.NET is probably only used on racks with two 128 thread AMD threadrippers per 2U for hobbyists and solo devs, python is a much better option, the ecosystem is more geared towards the sort of stuff you can do with consumer grade hardware
Super
SuperOP2y ago
btw I am trying now with Task.WaitAll(task1, task2); and its takes the same time as Parallel.ForEach :[
Henkypenky
Henkypenky2y ago
we said WhenAll not 2 tasks all tasks
Super
SuperOP2y ago
oh right right :]
Doombox
Doombox2y ago
yeah you need like 100+ tasks going to overcome the overhead, assuming your workload is actually parallelisable too if you're interacting with non threadsafe API's then you're just wasting time even trying or if the APIs are threadsafe but reliant heavily on locks for a very small amount of shared resources, also not going to do you much good
Super
SuperOP2y ago
ok trying now with WhenAll
Henkypenky
Henkypenky2y ago
i'm out, mother's day, good luck
Super
SuperOP2y ago
enjoy ;] WhenAll also slow :[ tpl also not :[
Henkypenky
Henkypenky2y ago
are u sure you are doing it right? how about some $code
MODiX
MODiX2y ago
To post C# code type the following: ```cs // code here ``` Get an example by typing $codegif in chat If your code is too long, post it to: https://paste.mod.gg/
Henkypenky
Henkypenky2y ago
also, the db is in memory, yes? please tell me it is
Super
SuperOP2y ago
Hi again :] I have created a ticket so the code is there if you want to take a look https://github.com/dotnet/runtime/issues/86218
GitHub
Parallel.ForEach is actually slow · Issue #86218 · dotnet/runtime
I have a process which I need to run very fast. when running with parallel.foreach (MaxDegreeOfParallelism = 8) (my cpu is 16) it takes 2 minutes. when changing MaxDegreeOfParallelism to 16 it take...
Super
SuperOP2y ago
the database is doing its own thing... so I don't know about being in memory. I think it has some kind of cache
oe
oe2y ago
@Super I've also had similar issues with Parallel.For in the past. Have you tried just launching threads manually in a for loop? If you want 100 threads just for loop 100 times When using Parallel.For and the process finishes, the thread is closed and a new thread is launched, which can also slow stuff down. With my approach you just have X worker threads that will constantly process data without changing state
Super
SuperOP2y ago
yes I have tried :{
Accord
Accord15mo ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity. Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.

Did you find this page helpful?