C
C#β€’4mo ago
substitute

new Span vs stackalloc

I'm working on a library that is able to read and write memory from a remote computer and was wondering which of these two is better practice?
public void WriteFloat(uint address, float value)
{
Span<byte> memory = stackalloc byte [sizeof(float)]
BinaryPrimitives.WriteSingleBigEndian(memory, value);
SetMemory(address, memory, out _);
}
public void WriteFloat(uint address, float value)
{
Span<byte> memory = stackalloc byte [sizeof(float)]
BinaryPrimitives.WriteSingleBigEndian(memory, value);
SetMemory(address, memory, out _);
}
public void WriteFloat(uint address, float value)
{
var memory = MemoryMarshal.Cast<float, byte>(new Span<float>(ref value));
BinaryPrimitives.WriteSingleBigEndian(memory, value);
SetMemory(address, memory, out _);
}
public void WriteFloat(uint address, float value)
{
var memory = MemoryMarshal.Cast<float, byte>(new Span<float>(ref value));
BinaryPrimitives.WriteSingleBigEndian(memory, value);
SetMemory(address, memory, out _);
}
I feel as though these would produce extremely if not exactly the same memory layouts, it creates a Span in both.
28 Replies
Kouhai
Kouhaiβ€’4mo ago
Both are fundamentally different, in the first one you're allocating 4 bytes on the stack (because sizeof(float) == 4) and the span basically points to said memory The second is that you're first getting a float* to the parameter value and then reinterpreting it as byte* You should go with the first option It's makes it much much more clear what you're doing
substitute
substituteOPβ€’4mo ago
Sure, but the memory is free to be clobbered that's why the first feels wasteful, it's an extra stack allocation for stack memory In both of these, the WriteSingleBigEndian produces the same memory if the host system is big endian (writing value to itself)
Kouhai
Kouhaiβ€’4mo ago
Allocating memory on the stack is basically just changing a value of a register, and 4 bytes is pretty much free
substitute
substituteOPβ€’4mo ago
Sure, but a reinterpret cast is free at least in languages like C++
Kouhai
Kouhaiβ€’4mo ago
In C# it'll be pretty much be free after JIT'ing as well Still I do not get why reinterepet cast a passed in parameter (even in C++) instead of just allocating 4 bytes of memory on the stack This won't impact your performance at all
substitute
substituteOPβ€’4mo ago
A passed in parameter is already in a register (in x86_64)
Kouhai
Kouhaiβ€’4mo ago
Have you benchmarked it in a hot path and found out that it'll actually effect your performance? πŸ˜… Also a parameter might not be in a register depending on it's size and the other passed in parameters In this case yes, it'll be in a register
substitute
substituteOPβ€’4mo ago
My main concern is stack size this call is fine other calls dealing with larger segments of memory, like float[256] could overflow the stack on the stackalloc
Kouhai
Kouhaiβ€’4mo ago
I'm kinda confused, 256 floats would be passed in to the method?
substitute
substituteOPβ€’4mo ago
Not in the float-only method
public void WriteFloats(uint address, Span<float> values)
{
foreach (ref var value in values)
{
BinaryPrimitives.WriteSingleBigEndian(MemoryMarshal.Cast<float, byte>(new Span<float>(ref value)), value);
}
SetMemory(address, MemoryMarshal.Cast<float, byte>(values), out _);
}
public void WriteFloats(uint address, Span<float> values)
{
foreach (ref var value in values)
{
BinaryPrimitives.WriteSingleBigEndian(MemoryMarshal.Cast<float, byte>(new Span<float>(ref value)), value);
}
SetMemory(address, MemoryMarshal.Cast<float, byte>(values), out _);
}
but other methods like taking a block of floats could take in N, and N could be <=> 256 a single stack alloc of float could be used if the data is moved between iterations but that also seems like it would be slower than in-place ops
Kouhai
Kouhaiβ€’4mo ago
That case makes more sense to be reinterpret casted, still I would use some sort of a memory pool and rent from it instead of writing directly to passed-in values Span
substitute
substituteOPβ€’4mo ago
the underlying api I send the memory to takes byte[] 😒 so I'm trying to avoid any extra overhead before the eventual heap allocation and memcpy
Kouhai
Kouhaiβ€’4mo ago
I mean, you can use ArrayPool to rent byte[] :Thonkers:
substitute
substituteOPβ€’4mo ago
that's fair, span in this case isn't guaranteed to be clobberable, but that's only a concern if someone is passing in data that they expect to use after for some reason size isn't constant, but I do know that in the usual case max size is 512 MiB and absolute max is 1024 MiB for the remote system no one should be writing all of the memory of the remote system, those are just the amounts it has πŸ˜…
Kouhai
Kouhaiβ€’4mo ago
Right πŸ˜… I personally think benchmarking different options would be the best way to know if these optimizations are worth it or not, also I would suggest asking people in #allow-unsafe-blocks they are much much more knowledgeable about low level stuff
substitute
substituteOPβ€’4mo ago
Also my only other hang-up with the arraypool is that I want the end-user to deal with how they use memory themselves (since you may also have more than one remote system that you are connected to for various things) but I'm sure there's some middle ground
if these optimizations are worth it or not
It's a project to learn more about the newer low-level C# stuff (and because the existing implementations of this are garbage) ; they are worth it to me πŸ˜‚ , but I get what you mean. I've been in C++ land for a few years so there's a lot of new C# stuff
Kouhai
Kouhaiβ€’4mo ago
Yeah C# did change a lot and allows for much more low-level coding now Honestly, reinterpret casting will work 100%, my only concern is that once you've passed in a Span and didn't realize will change (not necessarily you, but maybe someone else using the library for example) it'll cause too many bugs πŸ˜…
substitute
substituteOPβ€’4mo ago
Yeah, the optimal solution might end up being the middle ground with a single stack float used as temporary, clobberable storage. but I'm not liking that solution just because it'll turn into
x -> allocate y -> pass y as span to SetMemory -> allocate z from y via .toArray
x -> allocate y -> pass y as span to SetMemory -> allocate z from y via .toArray
which is very wasteful I can probably write an overload for byte[] directly to avoid that cost that might be the best option tbh that or unclobbering it after writing I guess
Kouhai
Kouhaiβ€’4mo ago
An overload for WriteFloats?
substitute
substituteOPβ€’4mo ago
it's not like we lose the actual data, we just swap from little to big endian if the system is little endian SetMemory
Kouhai
Kouhaiβ€’4mo ago
Oh write
substitute
substituteOPβ€’4mo ago
public void SetMemory(uint address, Span<byte> memory, out uint wrote)
{
_com.DebugTarget.SetMemory(address, unchecked((uint)memory.Length), memory.ToArray(), out wrote);
}
public void SetMemory(uint address, Span<byte> memory, out uint wrote)
{
_com.DebugTarget.SetMemory(address, unchecked((uint)memory.Length), memory.ToArray(), out wrote);
}
yeah, I think I'm going to do the middle ground and just write an overload for SetMemory I don't trust users to read documentation about it clobbering the input if I publish this, so I'll just create a copy that I clobber. Thanks!
Kouhai
Kouhaiβ€’4mo ago
Never trust users πŸ˜… People do not even read offical langauge docs
substitute
substituteOPβ€’4mo ago
yeah, and the last thing I need is someone confused because it works on a big endian machine but not on their personal computer on little endian because they won't even know what is wrong other than "don't work" @Kouhai Compiled both on sharplabs, the reinterpret cast is fewer instructions in the JIT assembly in debug, but the stack alloc is fewer instructions in release Interesting results It is additionally interesting as they’re doing the same thing (writing to some stack memory); Time to dive into the rabbit hole. (One just reuses the arg while the other uses new stack)
Kouhai
Kouhaiβ€’4mo ago
You might also wanna try examining assembly on Godbolt, it has better code generation compared to sharplabs
substitute
substituteOPβ€’4mo ago
Huh, I didn’t know gb supported C# I use them for C++ stuff stackalloc
Program:<<Main>$>g__WriteFloat2|0_2(uint,float) (FullOpts):
G_M4409_IG01: ;; offset=0x0000
sub rsp, 24
vzeroupper
xor eax, eax
mov qword ptr [rsp+0x08], rax
mov qword ptr [rsp+0x10], 0xCBAE90
G_M4409_IG02: ;; offset=0x0017
lea rax, [rsp+0x08]
vmovd ecx, xmm0
bswap ecx
mov dword ptr [rax], ecx
cmp qword ptr [rsp+0x10], 0xCBAE90
je SHORT G_M4409_IG03
call CORINFO_HELP_FAIL_FAST
G_M4409_IG03: ;; offset=0x0034
nop
G_M4409_IG04: ;; offset=0x0035
add rsp, 24
ret
Program:<<Main>$>g__WriteFloat2|0_2(uint,float) (FullOpts):
G_M4409_IG01: ;; offset=0x0000
sub rsp, 24
vzeroupper
xor eax, eax
mov qword ptr [rsp+0x08], rax
mov qword ptr [rsp+0x10], 0xCBAE90
G_M4409_IG02: ;; offset=0x0017
lea rax, [rsp+0x08]
vmovd ecx, xmm0
bswap ecx
mov dword ptr [rax], ecx
cmp qword ptr [rsp+0x10], 0xCBAE90
je SHORT G_M4409_IG03
call CORINFO_HELP_FAIL_FAST
G_M4409_IG03: ;; offset=0x0034
nop
G_M4409_IG04: ;; offset=0x0035
add rsp, 24
ret
re-use existing heap
Program:<<Main>$>g__WriteFloat|0_1(uint,float) (FullOpts):
G_M5640_IG01: ;; offset=0x0000
push rbp
sub rsp, 16
vzeroupper
lea rbp, [rsp+0x10]
vmovss dword ptr [rbp-0x04], xmm0
G_M5640_IG02: ;; offset=0x0012
lea rdi, bword ptr [rbp-0x04]
mov eax, 4
vmovss xmm0, dword ptr [rbp-0x04]
vmovd ecx, xmm0
bswap ecx
cmp eax, 4
jb SHORT G_M5640_IG05
mov dword ptr [rdi], ecx
G_M5640_IG03: ;; offset=0x002D
add rsp, 16
pop rbp
ret
G_M5640_IG04: ;; offset=0x0033
call CORINFO_HELP_OVERFLOW
G_M5640_IG05: ;; offset=0x0038
mov edi, 40
call [System.ThrowHelper:ThrowArgumentOutOfRangeException(int)]
int3
Program:<<Main>$>g__WriteFloat|0_1(uint,float) (FullOpts):
G_M5640_IG01: ;; offset=0x0000
push rbp
sub rsp, 16
vzeroupper
lea rbp, [rsp+0x10]
vmovss dword ptr [rbp-0x04], xmm0
G_M5640_IG02: ;; offset=0x0012
lea rdi, bword ptr [rbp-0x04]
mov eax, 4
vmovss xmm0, dword ptr [rbp-0x04]
vmovd ecx, xmm0
bswap ecx
cmp eax, 4
jb SHORT G_M5640_IG05
mov dword ptr [rdi], ecx
G_M5640_IG03: ;; offset=0x002D
add rsp, 16
pop rbp
ret
G_M5640_IG04: ;; offset=0x0033
call CORINFO_HELP_OVERFLOW
G_M5640_IG05: ;; offset=0x0038
mov edi, 40
call [System.ThrowHelper:ThrowArgumentOutOfRangeException(int)]
int3
from Godbolt on .Net 8, interesting.
Kouhai
Kouhaiβ€’4mo ago
stackalloc's code gen seems to be better πŸ˜…
substitute
substituteOPβ€’4mo ago
probably for many values I ended up adding an overload for taking a byte[] directly
public void WriteFloats(uint address, Span<float> values)
{
if (BitConverter.IsLittleEndian)
{
var memory = new byte[values.Length * sizeof(float)];
BinaryPrimitives.ReverseEndianness(
MemoryMarshal.Cast<float, int>(values),
MemoryMarshal.Cast<byte, int>(memory));
SetMemory(address, memory, out _);
return;
}
SetMemory(address, MemoryMarshal.Cast<float, byte>(values), out _);
}
public void WriteFloats(uint address, Span<float> values)
{
if (BitConverter.IsLittleEndian)
{
var memory = new byte[values.Length * sizeof(float)];
BinaryPrimitives.ReverseEndianness(
MemoryMarshal.Cast<float, int>(values),
MemoryMarshal.Cast<byte, int>(memory));
SetMemory(address, memory, out _);
return;
}
SetMemory(address, MemoryMarshal.Cast<float, byte>(values), out _);
}
There is a method to reverse the endianness of N values from a span. so I create a copy if I need to reverse the endianness and send that in as byte[] (to avoid another copy) otherwise I just pipe it directly in still need to test it but should work. further cleanup can still happen once I've tested it. I may be able to avoid a copy for already big endian systems if I am able to add extension methods for COM imports (it’s a COM class)
Want results from more Discord servers?
Add your server