❔ Parallel.ForEach is actually slow
I have a process which I need to run very fast. when running with parallel.foreach (8) (my cpu is 16) it takes 2 minutes.
When splitting the data and running it from two seperate EXE processes, it takes 1 minute for both (half time, same data).
Anyone knows why? or how can I fix it without running multiple processes? thanks 🙂
Note: changing degree to 16 makes it even slower.
49 Replies
$code
To post C# code type the following:
```cs
// code here
```
Get an example by typing
$codegif
in chat
If your code is too long, post it to: https://paste.mod.gg/are u using List<T>?
is whatever you're doing actually parallelisable? and are you doing enough work enough times to justify the overheads?
yes.. it something that will take 50 days without
I am trying to reduce it to one day
generally a better idea to use the TPL though, spin up tasks and let the TPL take care of it for you. Never been a huge fan of
Parallel.ForEach
, especially when you're doing the sort of heavy lifting you're talking aboutare u scanning the file system or something?
50 days? wth, what are u doing?
it is 4000 database SELECTs over 2000 different WHERE clauses and running ML on each one. (8000000)...
TPL will generally give you a solid return assuming you generate sufficient task granularity, and it takes away having to manage the threading yourself
Task.WhenAll should do it
why should I?
as our friend here
:] thanks I will check those out..
you can create a task for your DB stuff, get those all running then yeah just
Task.WhenAll
, or you can consume tasks as they complete and immediately begin an ML task (depends on how much of the previous tasks need to be completed)
maximising the amount of "code" running at any one time on the TPL will get you as much parallelism as is feasible, if it's still too slow then it's generally time for another languagethe problem can also be the database
no in-memory = slow
some also don't allow parallelisation on a single connection
but allow a functionally infinite number of connections
so worth looking into it
oh yeah
the pool thingy
that's a good one
someone somewhere will have a blog post on how to absolutely torture whatever DB you're using
another thing, if you are using EFCore it has some overhead also, ADO.NET is generally faster at the cost of less functionality, so i don't know maybe start getting some performance gains wherever you can
but i don't know what you are using
so if u give more details maybe it's easier
its postgresql protocol on questdb, implementation is done by ml.net (their code actually sends the query through database loader into IDataView)
I know sweet FA about ML.NET tbh, all I know is almost nobody uses it
so specific implementation stuff is beyond me
yeah lol no one is using it..
ML.Net is only good (fast) with data prediction, cause it uses the same algorithms as other implementations, that's what i know, i used it for data prediction and it was fast and the results comparable to other ones
as for other stuff like image or pattern recognition i have no idea
I am actually trying this contest https://www.kaggle.com/competitions/store-sales-time-series-forecasting/overview
Store Sales - Time Series Forecasting
Use machine learning to predict grocery sales
I am place 96 😉
yeah you might get a little bit more out of ML.net but don't expect much, probably first places use enterprise grade ML
and you won't ever get that with ML.NET
what do you mean by "enterprise grade"
azure machine learning or any public cloud solution
they have more power than you
probably
it's all about resources
yeah this is why I am parallelizing the shit out of it ;]
I thought most of people in kaggle use the kaggle cpus
only so much you can do with CPU parallelisation
almost all the big gains are on the GPU end
we are talking several servers working concurrently here, not a single cpu parallelized to jesus
yeah but ml.net isn't using that
yeah :\ I understand
yeah but ML.NET is probably only used on racks with two 128 thread AMD threadrippers per 2U
for hobbyists and solo devs, python is a much better option, the ecosystem is more geared towards the sort of stuff you can do with consumer grade hardware
btw I am trying now with
Task.WaitAll(task1, task2);
and its takes the same time as Parallel.ForEach
:[we said WhenAll
not 2 tasks
all tasks
oh right right :]
yeah you need like 100+ tasks going to overcome the overhead, assuming your workload is actually parallelisable too
if you're interacting with non threadsafe API's then you're just wasting time even trying
or if the APIs are threadsafe but reliant heavily on locks for a very small amount of shared resources, also not going to do you much good
ok trying now with WhenAll
i'm out, mother's day, good luck
enjoy ;]
WhenAll also slow :[
tpl also not :[
are u sure you are doing it right?
how about some $code
To post C# code type the following:
```cs
// code here
```
Get an example by typing
$codegif
in chat
If your code is too long, post it to: https://paste.mod.gg/also, the db is in memory, yes?
please tell me it is
Hi again :] I have created a ticket so the code is there if you want to take a look https://github.com/dotnet/runtime/issues/86218
GitHub
Parallel.ForEach is actually slow · Issue #86218 · dotnet/runtime
I have a process which I need to run very fast. when running with parallel.foreach (MaxDegreeOfParallelism = 8) (my cpu is 16) it takes 2 minutes. when changing MaxDegreeOfParallelism to 16 it take...
the database is doing its own thing... so I don't know about being in memory. I think it has some kind of cache
@Super I've also had similar issues with Parallel.For in the past. Have you tried just launching threads manually in a for loop?
If you want 100 threads just for loop 100 times
When using
Parallel.For
and the process finishes, the thread is closed and a new thread is launched, which can also slow stuff down. With my approach you just have X worker threads that will constantly process data without changing stateyes I have tried :{
Was this issue resolved? If so, run
/close
- otherwise I will mark this as stale and this post will be archived until there is new activity.
Was this issue resolved? If so, run /close
- otherwise I will mark this as stale and this post will be archived until there is new activity.