❔ Lazily evaluating Dictionary in update loop with good performance
Situation
I have an update loop which receives some data which is Dictionary-like. Due to a version difference, I receive two types of data. The receiving of this data is not a concern and is relatively cheap.
The two types of data I receive are as follows:
- an array of type
(TKey, TValue)[]
,
- two arrays of types TKey[]
and TValue[]
.
I cannot change these input types (no, I cannot change the array of tuples to an array of KeyValuePair<TKey, TValue>
s).
I know with certainty that the keys are unique, and I know with certainty that they map to exactly one value.
Goal
I want to use this received data just like a Dictionary (an IReadOnlyDictionary<TKey, TValue>
to be precise) without actually building the Dictionary the moment I receive it.
By this I mean, I do not want to populate a Dictionary<TKey, TValue>
with the data, which may be quite large. Doing this would potentially delay the update loop unnecessarily (not actually needing all the data, since only a selection of it will be accessed).
Problem
I can safely have two types of implementations for the two different types of data received.
The issue now is making accesses performant (like the hashing in Dictionaries usually is).
My current idea is the naïve way of making the evaluation linear and caching anything we've already seen. For the ContainsKey implementation, it may be possible to implement a HashSet<TKey>
. This, however, would add an extra allocation, would need to do potentially unnecessary computation, and would require some pretty bad logic for the tuple array data.
I'm simply not sure how to implement this very well such that the lazy evaluation is not simply linear with a cache. Any ideas?61 Replies
Not sure if I fully understand what your problem is but would
Dictionary<TKey, int> keyToIndex; TValue[] values;
work?I was considering that actually, yeah. But that would again construct a complete dictionary of keys that may not be accessed. And if I do that, I may as well put the values into their proper slots.
do u know beforehand which keys u want to access?
I do not
How many keys are you receiving with the data?
it can vary wildly. it could be a handful, or hundreds
Is it possible you will query the dictionary for keys that may not exist
potentially, yes
in which case i would throw a KeyNotFoundException
like a normal dictionary would
How about sorting underlying data by a key? This would allow you to use a binary search algorithm for
ContainsKey()
, getters and setters
Sorting is not so fast O(N*log N)
, but you need to do it only once after retrieving the underlying data, and access will become O(log N)
, which is really fastI don't really know how to do any of those
there are built-in functions for that in .NET
let me write some PoC
Pastebin
class Dictionary1 : IReadOnlyDictionary{ priv - Pastebin.com
Pastebin.com is the number one paste tool since 2002. Pastebin is a website where you can store text online for a set period of time.
TKey
should be something comparable for this, or you can define a custom comparison for itwhere does the input come from? maybe there is already some logic to it so u can write "just" a small wrapper around it to calculate the index in the array?
how come you're sorting the arrays?
binary search doesnt work without sorted arrays
else u would need a linear search and end up with O(n)
hm
i would like to implement a way to store things that have been accessed in a cache so that subsequent accesses are faster, and then we only binarysearch the remaining things
but i'm not really sure how to do that well
Binary search need arrays to be sorted, so I sort them in constructor
If sorting array everytime you retrieve data is acceptable for you, this seems to be the best solution
so on each update loop iteration u get the new as-is set of data?
will these keys be accessed via that dictionary more than once within the same set of data?
we do not know enough about the semantics behind the data to give any hints on how to do it right now
It can be used like any other dictionary. That includes accessing only select amounts of keys, accessing certain keys more than once, anything that comes with any other normal dictionary
I get the data, create my buffered dictionary, then send that dictionary off to the user who is using it in their update loop
The user decides where the data in the dictionary comes from, so the contents can vary vastly
so the input data is only once on construction or will be regenerated on each update loop iteration?
the dictionary contents may change every update iteration, the contents need to be re-fetched every time
the fetching is really not an issue though, it's either one or two reads (depending on the 2 versions) with just a pointer buffer and a size
then sorting + binary search isnt really helpful
hm
Well, re-sorting on every update seems heavy
But depending on how much and now often, it can be acceptable
this would be my rough draft that needs some adjustments/multiple implementations
this is for the worst case scenario that a lot of keys will be accessed multiple times
okay let me be more specific, perhaps
it needs some fixing tho:
1) turning it into a struct
2) using object pools
3) split it up into multiple structs with different implementations
(just don't make a fuss, otherwise i'll just ask in #allow-unsafe-blocks which literally never complains about this)
System Failure: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
what
well just ignore them ;p
well anyway
so, c# processes can contain c# dictionaries
i'm trying to read such a dictionary from the memory of said other process
(again, don't make a fuss about this, else i'm just gonna ask in #allow-unsafe-blocks which has been very willing to help with this kind of stuff and is very aware of what i'm making)
older .net versions had
TKey[] keys
and TValue[] values
arrays as their storage. newer ones have a KeyValuePair<TKey, TValue>[]
. these are the two types of data i receive. the user specifies where the dictionary is located, i read the data, and return the new dictionary.
the user wants to receive this dictionary data on every iteration of their update loop. it's also very obvious why the dictionary contents may change.
however, evaluating and building an entire dictionary on every iteration would be incredibly slow and detrimental to performancewell i thought so, but wasnt sure thats why i asked.
but as u dunno about the changes in the array, basically u need to rebuild the cache each time because u said u will only access a small portion of the whole data
thus u search for it linearly or figure out how unity's dictionary/map whatever it is is computing the array so u can predict the index reliable
it's often possible the user does not need to access every entry in the dictionary. it entirely depends on their use case
the caching itself will only make sense if u try quite often to access the same kvp
say for example it's a
Dictionary<int, bool>
in a game, where the key is some sort of chapterId
and the value is a completed
representation
in that case the user may iterate the entire thing to check for completion status
i think accessing a key more than once is less likely
it would certainly be bad practice, but i don't trust my users to be very well-versed in c# in general (very beginner-oriented environment)then forget about caching as u have to rebuild it on every update
now here's the thing though, reading a
Dictionary<string, TValue>
is very differenti guess the best - but very annoying - solution would be a sg that analyzes their code to build the cache on each iteration
in memory, strings are, obviously, references
ah damn
and string reads are expensive
so i would not like to perform a string read more than once, which is where caching could be beneficial
to build a cache efficiently u would need to know beforehand which keys will be accessed
yeah, unfortunately i can't know that
but if thats only runtime info its impossible
a simple linear scan seems like the only reliable way here tbh
i kinda also wanna try to minimize allocations, but i'm not sure i can safely do that here
well, the wrapper(s) can be readonly struct
Rebuilding cache to correspond to dictionary changes will be hard to implement and (maybe) slow
Then accessing the keys linearly or performing sorting seem to be the only solutions
yeah, and there it then depends on how slow the comparison of keys are and how many entries there are in total as well
because of possible changes the array will have to be walked once anyway if u want to build a cache
and that per iteration of the update loop
oh yeah i might also need to mention that for the
BufferedDictionary<TKey, TValue>
version, we have TKey : unmanaged
and TValue : unmanaged
but i don't have access to vectorization lmao
i mean i do, but IsHardwareAccelerated = false
why did u even bring up this then ...
string
isnt an unmanaged typeit would be a
BufferedDictionary<TValue> : IReadOnlyDictionary<string, TValue>
where TValue : unmanaged
different implementation entirelyu claimed it otherwise just some minutes ago T_T
BufferedDictionary<TKey, TValue>
!= BufferedDictionary<TValue>
tbh this all sounds like u want a generic solution for nieche performance tweaks, u wont get both at the same time
brb ciggy while thinking about it
(and dont use strings as keys D:)
not really my decision
as i remember you are injecting into unity right?
if the game has a
Dictionary<string, int>
and the user wants to read that, then nothing should stop them
i'm not injecting, no
maybe later
but injection is also not possible all the time
so i need solutions for both (or something)best u can do is having 2 implementations:
- linear search only
- cache + linear search
the rest is up to the user to benchmark whats better
oh actually 3, forgot about the sort + binary search for a moment
+ the 4th where thats cached as well
and u will most likely need custom implementations for the dictionary and/or set caches which will use object buffers, else the garbage collector will explode sooner or later
Was this issue resolved? If so, run
/close
- otherwise I will mark this as stale and this post will be archived until there is new activity.