new Span vs stackalloc
I'm working on a library that is able to read and write memory from a remote computer and was wondering which of these two is better practice?
I feel as though these would produce extremely if not exactly the same memory layouts, it creates a Span in both.
28 Replies
Both are fundamentally different, in the first one you're allocating 4 bytes on the stack (because sizeof(float) == 4) and the span basically points to said memory
The second is that you're first getting a float* to the parameter
value
and then reinterpreting it as byte*
You should go with the first option
It's makes it much much more clear what you're doingSure, but the memory is free to be clobbered
that's why the first feels wasteful, it's an extra stack allocation for stack memory
In both of these, the WriteSingleBigEndian produces the same memory if the host system is big endian (writing value to itself)
Allocating memory on the stack is basically just changing a value of a register, and 4 bytes is pretty much free
Sure, but a reinterpret cast is free
at least in languages like C++
In C# it'll be pretty much be free after JIT'ing as well
Still I do not get why reinterepet cast a passed in parameter (even in C++) instead of just allocating 4 bytes of memory on the stack
This won't impact your performance at all
A passed in parameter is already in a register (in x86_64)
Have you benchmarked it in a hot path and found out that it'll actually effect your performance? π
Also a parameter might not be in a register depending on it's size and the other passed in parameters
In this case yes, it'll be in a register
My main concern is stack size
this call is fine
other calls dealing with larger segments of memory, like float[256]
could overflow the stack
on the stackalloc
I'm kinda confused, 256 floats would be passed in to the method?
Not in the float-only method
but other methods like taking a block of floats
could take in N, and N could be <=> 256
a single stack alloc of float could be used if the data is moved between iterations
but that also seems like it would be slower than in-place ops
That case makes more sense to be reinterpret casted, still I would use some sort of a memory pool and rent from it instead of writing directly to passed-in values Span
the underlying api I send the memory to takes byte[] π’
so I'm trying to avoid any extra overhead before the eventual heap allocation and memcpy
I mean, you can use
ArrayPool
to rent byte[]
:Thonkers:that's fair, span in this case isn't guaranteed to be clobberable, but that's only a concern if someone is passing in data that they expect to use after for some reason
size isn't constant, but I do know that in the usual case max size is 512 MiB and absolute max is 1024 MiB
for the remote system
no one should be writing all of the memory of the remote system, those are just the amounts it has π
Right π
I personally think benchmarking different options would be the best way to know if these optimizations are worth it or not, also I would suggest asking people in #allow-unsafe-blocks they are much much more knowledgeable about low level stuff