Advice for Multi Threading
Want some advice for multi-threading
Basically, I had a Schedule Job running every 7 seconds, and what it would do is, is pick up 5 items in a queue that had to be done, and processed them one by one, this was slow, and considering we had 900,000 items in the queue, we decided that multithreading was the way to go.
The way we implemented is, we still have the item count at 5, but we have 16 threads, so from our database, we request 5 x 16 = 80 items. and then partition the big list of 80 items into 16 lists of 5 items each, which are then executed by each thread, now this was a massive improvement, we went from 10k records a day to just over 150k records a day
We're on the tail end of this intensive first time task, we have just over 7k items remaining before we're done with our queue (that had 900k items)
But in the future, we might be getting another 500k-1 million items in the queue that will need to be processed.
Any suggestions for how I can improve this logic/best practices for partitioning and allocating items to threads
One idea was, that I don't split the list at all, and each thread simply picks an item from a global list whenever it is free. Though I have no idea how to do this without two threads picking the same items at the same time.
10 Replies
One idea was, that I don't split the list at all, and each thread simply picks an item from a global list whenever it is free. Though I have no idea how to do this without two threads picking the same items at the same time.As for this idea, I'll figure it out if I actually work on it, probably, but I wanted to know if there are better practices first, instead of dedicating my time to figuring this out
if you wanted to do that you'd use a thread safe collection like ConcurrentBag/Queue instead of a List
but besides that it's not far off, i would recommend using the Task/Parallel APIs normally but it sounds like these are very long running jobs so you wouldn't want those consuming thread pool threads unless the application does nothing but process these jobs
Application has other purposes as well, but when these jobs are running, we don't use it for anything else, since we want it to fully do the jobs
I'll look into ConcurrentBag, never heard of this before
are they IO bound or compute bound?
Technically both, but more IO bound than compute I think, that's the bulk of where it takes the time
in that case you'd definitely benefit from using tasks and async code where possible, that will increase throughput without increasing threads required
Yup, already have asyncronous code everywhere
The whole application was built with that in mind, so everything has async await
then why is not async/await enough and you to care about threads?
(also have you considered dataflow?)
Mainly due to the sheer number of operations
Was not fast enough
sounds strange to me
i can make over 2000 async http/sec to elastic no problem, for example
async is slower than sync, but we're talking nanosec