C
C#17mo ago
Trace

❔ Batching

Lets say I have a class like so:
class Item
{
public string Id {get; set;}
public SomeClass Data {get; set;}
}
class Item
{
public string Id {get; set;}
public SomeClass Data {get; set;}
}
I then have a list of thousands of those and need to batch it into chunks of max 100 items (can be less but not more) but such that two items with the same Id do not end up in separate batches (there are max like 10 items with the same ID, so there's not risk of n of items with x Id being higher than the y number of max items in a batch). What would be the most efficient way to do this?
11 Replies
Denis
Denis17mo ago
Is the duplicate id occurrence guaranteed not to exceed 10? Or is that just an observation? Because if it is 101 instances then I'm not sure it is possible to fulfill your requirements Other than that, I suppose there isn't much you can do performance wise. You will still have to traverse the whole list of items, and group then by id Then when creating batches of 100 items, you iterate through the groups and add each one's items to the batch Meanwhile you'd check if adding the next group of items would fit your batch size. So I suppose this has O(n) complexity, hopefully someone corrects me, if I'm horribly wrong
Trace
Trace17mo ago
guaranteed yeah that's what I was gonna do was just wondering if there was a better approach possibly
Tvde1
Tvde117mo ago
use .Chunk() from LINQ? use observables and buffering?
Trace
Trace17mo ago
doesn't have the behaviour where it guarantees the two-five with the same ID won't end up across chunk boundaries
Pobiega
Pobiega17mo ago
Are the same ID items consecutive? ie, as soon as you see a new ID you know that any previous group is "done"?
Trace
Trace17mo ago
No
Pobiega
Pobiega17mo ago
Ah, thats a shame... but upon seeing an ID and then reading X (10?) items, you are guaranteed to have seen all items of that ID? I'm really liking Tvde1's suggestion of using Observables here thou its an excellent fit for it, but you could ofc get similar behaviour using a more traditional IEnumerable approach
Tvde1
Tvde117mo ago
still tricky how you'd instruct it to place all similar ids in one batch
Pobiega
Pobiega17mo ago
yeah, I was working on a proof of concept but it does get tricky
hengi
hengi17mo ago
You might wanna look into the bin packing problem If you treat identical IDs as being a single object with weight=count, you should be able to find a good solution It would basically just come down to group by ID into a List and then distributing the Lists into N buckets of size 100 where List.Count is the weight or size of the item
Accord
Accord17mo ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.