C#•3y ago

❔ Batching

Lets say I have a class like so:

class Item
{
  public string Id {get; set;}
  public SomeClass Data {get; set;}
}

class Item
{
  public string Id {get; set;}
  public SomeClass Data {get; set;}
}

I then have a list of thousands of those and need to batch it into chunks of max 100 items (can be less but not more) but such that two items with the same Id do not end up in separate batches (there are max like 10 items with the same ID, so there's not risk of n of items with x Id being higher than the y number of max items in a batch). What would be the most efficient way to do this?

11 Replies

Denis•3y ago

Is the duplicate id occurrence guaranteed not to exceed 10? Or is that just an observation? Because if it is 101 instances then I'm not sure it is possible to fulfill your requirements Other than that, I suppose there isn't much you can do performance wise. You will still have to traverse the whole list of items, and group then by id Then when creating batches of 100 items, you iterate through the groups and add each one's items to the batch Meanwhile you'd check if adding the next group of items would fit your batch size. So I suppose this has O(n) complexity, hopefully someone corrects me, if I'm horribly wrong

TraceOP•3y ago

guaranteed yeah that's what I was gonna do was just wondering if there was a better approach possibly

Tvde1•3y ago

use .Chunk() from LINQ? use observables and buffering?

TraceOP•3y ago

doesn't have the behaviour where it guarantees the two-five with the same ID won't end up across chunk boundaries

Pobiega•3y ago

Are the same ID items consecutive? ie, as soon as you see a new ID you know that any previous group is "done"?

TraceOP•3y ago

Pobiega•3y ago

Ah, thats a shame... but upon seeing an ID and then reading X (10?) items, you are guaranteed to have seen all items of that ID? I'm really liking Tvde1's suggestion of using Observables here thou its an excellent fit for it, but you could ofc get similar behaviour using a more traditional IEnumerable approach

Tvde1•3y ago

still tricky how you'd instruct it to place all similar ids in one batch

Pobiega•3y ago

yeah, I was working on a proof of concept but it does get tricky

hengi•3y ago

You might wanna look into the bin packing problem If you treat identical IDs as being a single object with weight=count, you should be able to find a good solution It would basically just come down to group by ID into a List and then distributing the Lists into N buckets of size 100 where List.Count is the weight or size of the item

Accord•3y ago

Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.

Gaming

Programming

❔ Batching

Did you find this page helpful?