❔ Batching
Lets say I have a class like so:
I then have a list of thousands of those and need to batch it into chunks of max 100 items (can be less but not more) but such that two items with the same
Id
do not end up in separate batches (there are max like 10 items with the same ID, so there's not risk of n
of items with x
Id
being higher than the y
number of max items in a batch). What would be the most efficient way to do this?11 Replies
Is the duplicate id occurrence guaranteed not to exceed 10?
Or is that just an observation?
Because if it is 101 instances then I'm not sure it is possible to fulfill your requirements
Other than that, I suppose there isn't much you can do performance wise. You will still have to traverse the whole list of items, and group then by id
Then when creating batches of 100 items, you iterate through the groups and add each one's items to the batch
Meanwhile you'd check if adding the next group of items would fit your batch size.
So I suppose this has O(n) complexity, hopefully someone corrects me, if I'm horribly wrong
guaranteed
yeah that's what I was gonna do
was just wondering if there was a better approach possibly
use
.Chunk()
from LINQ?
use observables and buffering?doesn't have the behaviour where it guarantees the two-five with the same ID won't end up across chunk boundaries
Are the same ID items consecutive? ie, as soon as you see a new ID you know that any previous group is "done"?
No
Ah, thats a shame... but upon seeing an ID and then reading X (10?) items, you are guaranteed to have seen all items of that ID?
I'm really liking Tvde1's suggestion of using Observables here thou
its an excellent fit for it, but you could ofc get similar behaviour using a more traditional IEnumerable approach
still tricky how you'd instruct it to place all similar ids in one batch
yeah, I was working on a proof of concept but it does get tricky
You might wanna look into the bin packing problem
If you treat identical IDs as being a single object with weight=count, you should be able to find a good solution
It would basically just come down to group by ID into a List and then distributing the Lists into N buckets of size 100 where List.Count is the weight or size of the item
Was this issue resolved? If so, run
/close
- otherwise I will mark this as stale and this post will be archived until there is new activity.