C
C#2y ago
ero

❔ Lazily evaluating Dictionary in update loop with good performance

Situation I have an update loop which receives some data which is Dictionary-like. Due to a version difference, I receive two types of data. The receiving of this data is not a concern and is relatively cheap. The two types of data I receive are as follows: - an array of type (TKey, TValue)[], - two arrays of types TKey[] and TValue[]. I cannot change these input types (no, I cannot change the array of tuples to an array of KeyValuePair<TKey, TValue>s). I know with certainty that the keys are unique, and I know with certainty that they map to exactly one value. Goal I want to use this received data just like a Dictionary (an IReadOnlyDictionary<TKey, TValue> to be precise) without actually building the Dictionary the moment I receive it. By this I mean, I do not want to populate a Dictionary<TKey, TValue> with the data, which may be quite large. Doing this would potentially delay the update loop unnecessarily (not actually needing all the data, since only a selection of it will be accessed). Problem I can safely have two types of implementations for the two different types of data received. The issue now is making accesses performant (like the hashing in Dictionaries usually is). My current idea is the naïve way of making the evaluation linear and caching anything we've already seen. For the ContainsKey implementation, it may be possible to implement a HashSet<TKey>. This, however, would add an extra allocation, would need to do potentially unnecessary computation, and would require some pretty bad logic for the tuple array data. I'm simply not sure how to implement this very well such that the lazy evaluation is not simply linear with a cache. Any ideas?
61 Replies
bighugemassive3
Not sure if I fully understand what your problem is but would Dictionary<TKey, int> keyToIndex; TValue[] values; work?
ero
eroOP2y ago
I was considering that actually, yeah. But that would again construct a complete dictionary of keys that may not be accessed. And if I do that, I may as well put the values into their proper slots.
cap5lut
cap5lut2y ago
do u know beforehand which keys u want to access?
ero
eroOP2y ago
I do not
DaVinki
DaVinki2y ago
How many keys are you receiving with the data?
ero
eroOP2y ago
it can vary wildly. it could be a handful, or hundreds
DaVinki
DaVinki2y ago
Is it possible you will query the dictionary for keys that may not exist
ero
eroOP2y ago
potentially, yes in which case i would throw a KeyNotFoundException like a normal dictionary would
nukleer bomb
nukleer bomb2y ago
How about sorting underlying data by a key? This would allow you to use a binary search algorithm for ContainsKey(), getters and setters Sorting is not so fast O(N*log N), but you need to do it only once after retrieving the underlying data, and access will become O(log N), which is really fast
ero
eroOP2y ago
I don't really know how to do any of those
nukleer bomb
nukleer bomb2y ago
there are built-in functions for that in .NET let me write some PoC
nukleer bomb
nukleer bomb2y ago
Pastebin
class Dictionary1 : IReadOnlyDictionary{ priv - Pastebin.com
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
nukleer bomb
nukleer bomb2y ago
TKey should be something comparable for this, or you can define a custom comparison for it
cap5lut
cap5lut2y ago
where does the input come from? maybe there is already some logic to it so u can write "just" a small wrapper around it to calculate the index in the array?
ero
eroOP2y ago
how come you're sorting the arrays?
cap5lut
cap5lut2y ago
binary search doesnt work without sorted arrays else u would need a linear search and end up with O(n)
ero
eroOP2y ago
hm i would like to implement a way to store things that have been accessed in a cache so that subsequent accesses are faster, and then we only binarysearch the remaining things but i'm not really sure how to do that well
nukleer bomb
nukleer bomb2y ago
Binary search need arrays to be sorted, so I sort them in constructor If sorting array everytime you retrieve data is acceptable for you, this seems to be the best solution
cap5lut
cap5lut2y ago
so on each update loop iteration u get the new as-is set of data? will these keys be accessed via that dictionary more than once within the same set of data? we do not know enough about the semantics behind the data to give any hints on how to do it right now
ero
eroOP2y ago
It can be used like any other dictionary. That includes accessing only select amounts of keys, accessing certain keys more than once, anything that comes with any other normal dictionary I get the data, create my buffered dictionary, then send that dictionary off to the user who is using it in their update loop The user decides where the data in the dictionary comes from, so the contents can vary vastly
cap5lut
cap5lut2y ago
so the input data is only once on construction or will be regenerated on each update loop iteration?
ero
eroOP2y ago
the dictionary contents may change every update iteration, the contents need to be re-fetched every time the fetching is really not an issue though, it's either one or two reads (depending on the 2 versions) with just a pointer buffer and a size
cap5lut
cap5lut2y ago
then sorting + binary search isnt really helpful
ero
eroOP2y ago
hm
nukleer bomb
nukleer bomb2y ago
Well, re-sorting on every update seems heavy But depending on how much and now often, it can be acceptable
cap5lut
cap5lut2y ago
this would be my rough draft that needs some adjustments/multiple implementations
public class EroDictView<TKey, TValue> : IReadOnlyDictionary<TKey, TValue>
where TKey : notnull
{
private readonly IEqualityComparer<TKey> _comparer;

private readonly Dictionary<TKey, TValue> _cachedValues;
private readonly HashSet<TKey> _cachedMissing;

private readonly TKey[] _keys;
private readonly TValue[] _values;


public EroDictView(TKey[] keys, TValue[] values, IEqualityComparer<TKey> comparer)
{
_comparer = comparer;

_cachedValues = new(comparer);
_cachedMissing = new(comparer);

_keys = keys;
_values = values;
}

public bool TryGetValue(TKey key, [MaybeNullWhen(false)] out TValue value)
{
if (_cachedValues.TryGetValue(key, out value))
{
return true;
}

if (_cachedMissing.Contains(key))
{
return false;
}

for (int i = 0; i < _keys.Length; i++)
{
if (_comparer.Equals(key, _keys[i]))
{
value = _values[i];
_cachedValues[key] = value;
return true;
}
}

_cachedMissing.Add(key);

return false;
}
// rest of the implementation
}
public class EroDictView<TKey, TValue> : IReadOnlyDictionary<TKey, TValue>
where TKey : notnull
{
private readonly IEqualityComparer<TKey> _comparer;

private readonly Dictionary<TKey, TValue> _cachedValues;
private readonly HashSet<TKey> _cachedMissing;

private readonly TKey[] _keys;
private readonly TValue[] _values;


public EroDictView(TKey[] keys, TValue[] values, IEqualityComparer<TKey> comparer)
{
_comparer = comparer;

_cachedValues = new(comparer);
_cachedMissing = new(comparer);

_keys = keys;
_values = values;
}

public bool TryGetValue(TKey key, [MaybeNullWhen(false)] out TValue value)
{
if (_cachedValues.TryGetValue(key, out value))
{
return true;
}

if (_cachedMissing.Contains(key))
{
return false;
}

for (int i = 0; i < _keys.Length; i++)
{
if (_comparer.Equals(key, _keys[i]))
{
value = _values[i];
_cachedValues[key] = value;
return true;
}
}

_cachedMissing.Add(key);

return false;
}
// rest of the implementation
}
this is for the worst case scenario that a lot of keys will be accessed multiple times
ero
eroOP2y ago
okay let me be more specific, perhaps
cap5lut
cap5lut2y ago
it needs some fixing tho: 1) turning it into a struct 2) using object pools 3) split it up into multiple structs with different implementations
ero
eroOP2y ago
(just don't make a fuss, otherwise i'll just ask in #allow-unsafe-blocks which literally never complains about this)
Ticket Tool
Ticket Tool2y ago
System Failure: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
ero
eroOP2y ago
what WhatChamp
cap5lut
cap5lut2y ago
well just ignore them ;p
ero
eroOP2y ago
well anyway so, c# processes can contain c# dictionaries i'm trying to read such a dictionary from the memory of said other process (again, don't make a fuss about this, else i'm just gonna ask in #allow-unsafe-blocks which has been very willing to help with this kind of stuff and is very aware of what i'm making) older .net versions had TKey[] keys and TValue[] values arrays as their storage. newer ones have a KeyValuePair<TKey, TValue>[]. these are the two types of data i receive. the user specifies where the dictionary is located, i read the data, and return the new dictionary. the user wants to receive this dictionary data on every iteration of their update loop. it's also very obvious why the dictionary contents may change. however, evaluating and building an entire dictionary on every iteration would be incredibly slow and detrimental to performance
cap5lut
cap5lut2y ago
well i thought so, but wasnt sure thats why i asked. but as u dunno about the changes in the array, basically u need to rebuild the cache each time because u said u will only access a small portion of the whole data thus u search for it linearly or figure out how unity's dictionary/map whatever it is is computing the array so u can predict the index reliable
ero
eroOP2y ago
it's often possible the user does not need to access every entry in the dictionary. it entirely depends on their use case
cap5lut
cap5lut2y ago
the caching itself will only make sense if u try quite often to access the same kvp
ero
eroOP2y ago
say for example it's a Dictionary<int, bool> in a game, where the key is some sort of chapterId and the value is a completed representation in that case the user may iterate the entire thing to check for completion status i think accessing a key more than once is less likely it would certainly be bad practice, but i don't trust my users to be very well-versed in c# in general (very beginner-oriented environment)
cap5lut
cap5lut2y ago
then forget about caching as u have to rebuild it on every update
ero
eroOP2y ago
now here's the thing though, reading a Dictionary<string, TValue> is very different
cap5lut
cap5lut2y ago
i guess the best - but very annoying - solution would be a sg that analyzes their code to build the cache on each iteration
ero
eroOP2y ago
in memory, strings are, obviously, references
cap5lut
cap5lut2y ago
ah damn
ero
eroOP2y ago
and string reads are expensive so i would not like to perform a string read more than once, which is where caching could be beneficial
cap5lut
cap5lut2y ago
to build a cache efficiently u would need to know beforehand which keys will be accessed
ero
eroOP2y ago
yeah, unfortunately i can't know that
cap5lut
cap5lut2y ago
but if thats only runtime info its impossible a simple linear scan seems like the only reliable way here tbh
ero
eroOP2y ago
i kinda also wanna try to minimize allocations, but i'm not sure i can safely do that here
cap5lut
cap5lut2y ago
well, the wrapper(s) can be readonly struct
nukleer bomb
nukleer bomb2y ago
Rebuilding cache to correspond to dictionary changes will be hard to implement and (maybe) slow Then accessing the keys linearly or performing sorting seem to be the only solutions
cap5lut
cap5lut2y ago
yeah, and there it then depends on how slow the comparison of keys are and how many entries there are in total as well because of possible changes the array will have to be walked once anyway if u want to build a cache and that per iteration of the update loop
ero
eroOP2y ago
oh yeah i might also need to mention that for the BufferedDictionary<TKey, TValue> version, we have TKey : unmanaged and TValue : unmanaged but i don't have access to vectorization lmao i mean i do, but IsHardwareAccelerated = false
cap5lut
cap5lut2y ago
why did u even bring up this then ... string isnt an unmanaged type
ero
eroOP2y ago
it would be a BufferedDictionary<TValue> : IReadOnlyDictionary<string, TValue> where TValue : unmanaged different implementation entirely
cap5lut
cap5lut2y ago
u claimed it otherwise just some minutes ago T_T
ero
eroOP2y ago
BufferedDictionary<TKey, TValue> != BufferedDictionary<TValue>
cap5lut
cap5lut2y ago
tbh this all sounds like u want a generic solution for nieche performance tweaks, u wont get both at the same time brb ciggy while thinking about it (and dont use strings as keys D:)
ero
eroOP2y ago
not really my decision
nukleer bomb
nukleer bomb2y ago
as i remember you are injecting into unity right?
ero
eroOP2y ago
if the game has a Dictionary<string, int> and the user wants to read that, then nothing should stop them i'm not injecting, no maybe later but injection is also not possible all the time so i need solutions for both (or something)
cap5lut
cap5lut2y ago
best u can do is having 2 implementations: - linear search only - cache + linear search the rest is up to the user to benchmark whats better oh actually 3, forgot about the sort + binary search for a moment + the 4th where thats cached as well and u will most likely need custom implementations for the dictionary and/or set caches which will use object buffers, else the garbage collector will explode sooner or later
Accord
Accord2y ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.
Want results from more Discord servers?
Add your server