new Span vs stackalloc
I'm working on a library that is able to read and write memory from a remote computer and was wondering which of these two is better practice?
I feel as though these would produce extremely if not exactly the same memory layouts, it creates a Span in both.
28 Replies
Both are fundamentally different, in the first one you're allocating 4 bytes on the stack (because sizeof(float) == 4) and the span basically points to said memory
The second is that you're first getting a float* to the parameter
value
and then reinterpreting it as byte*
You should go with the first option
It's makes it much much more clear what you're doingSure, but the memory is free to be clobbered
that's why the first feels wasteful, it's an extra stack allocation for stack memory
In both of these, the WriteSingleBigEndian produces the same memory if the host system is big endian (writing value to itself)
Allocating memory on the stack is basically just changing a value of a register, and 4 bytes is pretty much free
Sure, but a reinterpret cast is free
at least in languages like C++
In C# it'll be pretty much be free after JIT'ing as well
Still I do not get why reinterepet cast a passed in parameter (even in C++) instead of just allocating 4 bytes of memory on the stack
This won't impact your performance at all
A passed in parameter is already in a register (in x86_64)
Have you benchmarked it in a hot path and found out that it'll actually effect your performance? π
Also a parameter might not be in a register depending on it's size and the other passed in parameters
In this case yes, it'll be in a register
My main concern is stack size
this call is fine
other calls dealing with larger segments of memory, like float[256]
could overflow the stack
on the stackalloc
I'm kinda confused, 256 floats would be passed in to the method?
Not in the float-only method
but other methods like taking a block of floats
could take in N, and N could be <=> 256
a single stack alloc of float could be used if the data is moved between iterations
but that also seems like it would be slower than in-place ops
That case makes more sense to be reinterpret casted, still I would use some sort of a memory pool and rent from it instead of writing directly to passed-in values Span
the underlying api I send the memory to takes byte[] π’
so I'm trying to avoid any extra overhead before the eventual heap allocation and memcpy
I mean, you can use
ArrayPool
to rent byte[]
:Thonkers:that's fair, span in this case isn't guaranteed to be clobberable, but that's only a concern if someone is passing in data that they expect to use after for some reason
size isn't constant, but I do know that in the usual case max size is 512 MiB and absolute max is 1024 MiB
for the remote system
no one should be writing all of the memory of the remote system, those are just the amounts it has π
Right π
I personally think benchmarking different options would be the best way to know if these optimizations are worth it or not, also I would suggest asking people in #allow-unsafe-blocks they are much much more knowledgeable about low level stuff
Also my only other hang-up with the arraypool is that I want the end-user to deal with how they use memory themselves
(since you may also have more than one remote system that you are connected to for various things) but I'm sure there's some middle ground
if these optimizations are worth it or notIt's a project to learn more about the newer low-level C# stuff (and because the existing implementations of this are garbage) ; they are worth it to me π , but I get what you mean. I've been in C++ land for a few years so there's a lot of new C# stuff
Yeah C# did change a lot and allows for much more low-level coding now
Honestly, reinterpret casting will work 100%, my only concern is that once you've passed in a
Span
and didn't realize will change (not necessarily you, but maybe someone else using the library for example) it'll cause too many bugs π
Yeah, the optimal solution might end up being the middle ground with a single stack float used as temporary, clobberable storage.
but I'm not liking that solution just because it'll turn into
which is very wasteful
I can probably write an overload for
byte[]
directly to avoid that cost
that might be the best option tbh
that or unclobbering it after writing
I guessAn overload for
WriteFloats
?it's not like we lose the actual data, we just swap from little to big endian if the system is little endian
SetMemory
Oh write
yeah, I think I'm going to do the middle ground and just write an overload for SetMemory
I don't trust users to read documentation about it clobbering the input if I publish this, so I'll just create a copy that I clobber.
Thanks!
Never trust users π
People do not even read offical langauge docs
yeah, and the last thing I need is someone confused because it works on a big endian machine but not on their personal computer on little endian
because they won't even know what is wrong other than "don't work"
@Kouhai Compiled both on sharplabs, the reinterpret cast is fewer instructions in the JIT assembly in debug, but the stack alloc is fewer instructions in release
Interesting results
It is additionally interesting as theyβre doing the same thing (writing to some stack memory); Time to dive into the rabbit hole.
(One just reuses the arg while the other uses new stack)
You might also wanna try examining assembly on Godbolt, it has better code generation compared to sharplabs
Huh, I didnβt know gb supported C#
I use them for C++ stuff
stackalloc
re-use existing heap
from Godbolt on .Net 8, interesting.
stackalloc's code gen seems to be better π
probably
for many values I ended up adding an overload for taking a
byte[]
directly
There is a method to reverse the endianness of N values from a span.
so I create a copy if I need to reverse the endianness and send that in as byte[] (to avoid another copy)
otherwise I just pipe it directly in
still need to test it but should work.
further cleanup can still happen once I've tested it.
I may be able to avoid a copy for already big endian systems if I am able to add extension methods for COM imports (itβs a COM class)