BinarySerialization - need some design help
Hi, this is my first foray into lower level programming and I could use some help.
I am writing a DOCSIS-compliant binary serializer for C#. It will take as input a
.cfg
or .txt
file that resembles a JSON, but is a bit different:
This is a barebones, minified example of what these files will contain. Every line is either a property or a collection of properties.
I will refer to these properties as DocsisProperty<T>
, since they will be deserialized into a C# type and then encoded into byte[]
later on.
Now the issue I am facing is that there are 25 ish encoding methods and 25 decoding methods, but hundreds of different properties. All unique properties are stored in a property map with name, id and a reference to the correct method for encoding or decoding.
This can be seen here: https://github.com/rlaager/docsis/blob/master/src/docsis_symtable.h
My original idea was that when I am parsing a file, I can initialize a new DocsisProperty<T>
based on the identifier of the value, then later on when I want to encode it I call docsisProperty.Encode()
, and it will look up some static dictionary to figure out which encoding method is the appropriate one. However that gets complicated quickly because of the generic type on DocsisProperty<T>
. The dictionary would then have to use object
or dynamic
as it's return type, and I am not a fan of either approach.30 Replies
character limit^
The other solution could be to remove the generic type, and instead store the value as the raw string that was read from the input file. Then the en-/decoding functions would have to first parse the string then encode the parsed result, which kind of would give them an additional responsibility (string deserializing) in addition to binary serialization.
So what I was thinking was something like this, but I don't like introducing dynamic into any code, so it feels very wrong lol
instead of removing the generic, u could simply add an additional non-generic interface i think
Oh interesting. Allright. Can you tell me about what that would do?
well, u could then use
DocsisProperty
in the dictionary. but from the outside u could still have the "concrete generic version"
the DocsisDecoder.Decode(...)
methods would still return the DocsisProperty<T>
and as u know the correct type u could simply cast it back:
i would also not use byte[]
for that but something like like ROS<byte>
where the decoder method would look then something like
the ref
or out
generally to keep track of how many bytes are already consumed, because then u could also write a decode method for variable length stuff like ClassOfService
and generally the ReadOnlySpan<byte>
(or ReadOnlyMemory<byte>
) so u dont have copy dataGenerally the actual encoding / decoding methods are already implemented in C, so I'm just translating them to C#.
How come you'd prefer ROS over byte[]? I don't know enough about this to have an opinion, but I'll start reading up on ROS if you think its a good idea
Span
and ReadOnlySpan
(and the ..Memory
) structs are simply over something else's memory. for example an array (or some native memory)
with the example decoding methods u could basically do
slicing memory/spans like buffer = buffer[4..];
doesnt make an actual copy of the data
because they simply point to some start element and contain a length
they reflect continuous memory
depending on how big that data is in general/how u read it, ReadOnlySequence<byte>
might be better, because this thingy can be "spanned" over multiple buffersAh, allright. Whereas an array would dedicate actual memory to it and so on, ROS will be better, is that it?
(eg if u read via
System.IO.Pipelines
from some network stream, or lets say a 5GB binary file)
if u would slice a array directly, the resulting slice, wouldnt be the same memory as the original array.
it would be a copy of that part of the arrayGotcha, okay that makes a lot of sense. Then that means byte[] is actually rather inefficient
But a ROS cannot be a return type, right? it'd have to be a ref arg
it can be a return type, it can just not be a member of any type (besides other ref structs) and are somewhat limited in async code, but thats what u can use ROM for
Ah gotcha, I just cannot use it as an
out
paramhere is a little example how i do reading on
ReadOnlySequence<byte>
, where i internally use Span<byte>
/ReadOnlySpan<byte>
for actual reading:
and the span reader:
Oh I wasn't aware of BinaryPrimitives! Makes sense something like that'd exist
it's just missing UInt24, but I can implement that myself
isnt that hard either ;p
Okay so it this better than operating directly on a byte array? Then you'd just achieve what you wanted with bitwise operations. Remember I will be encoding to memory that hasn't been written yet, as in the buffer array doesn't exist before I call my methods
basically u would just create a 4 byte buffer, copy the 3 bytes at the appropriate position, and read the uint from that position
Okay. And using stackalloc, ROS etc is better suited for that purpose than byte arrays?
Remember I ask cause I don't know 🙂
stackalloc
is only good for the method u r in, after u leave the method, that memory is gone again.
but why give the gc something to do, for a small 8 byte buffer u need for less than one millisecondGood point!
ROS/ROM are abstractions over arbitrary memory, so either on the stack, managed or unmanaged heap, its mainly to reduce the number of copying u would need to do
writing to memory is quite similar
but u have to pre-allocate sufficient memory
but for writting u would need to pre allocate ur byte array as well ;p
Okay, excellent. I'll try and implement this. UInt24 and such is quite easily done, but this domain I believe gets a bit involved in some usecases haha. You ok if I poke you later on with some specific usecase if I got questions?
sure
im just procrastinating writing this god damned source generator anyway xD
😄
I'm so out of my depth here haha, I'm just used to high level stuff
(these readers/writers ive shown will be the building blocks for the reader/writer generation for the packets - im writing some network code right now, which has a reaaaally ugly protocol 😂 )
the sg will generate readers and writers for example for this packet:
readonly record struct, that's a new one for me
😄
so records are readonly by default, but structs aren't as it's a value type, but records can be reference types with value semantics.. i'm confused xD
well, in the end no matter if record class or record struct, the properties are only getters (and the fields are readonly iirc as well).
but for record structs, that still would just produce a
struct
and not a readonly struct
basically ill write some quick and dirty boxing code to make it run, and then i gonna fine tune it later use the structs non-boxed, so i save some allocations where possibleAh gotcha
Did you have a SequenceWriter also?
dont need one, so nope 😂
the thing is, i use
System.IO.Pipelines
, there u allocate the buffer like Memory<byte> buffer = _pipe.Writer.GetMemory(minBufferSize);
, so u dont even have a writable Sequence<byte>
or similar
basically, after writing to that memory buffer, u call _pipe.Writer.Advance(bytesWritten)
,
for the next stuff u write, u simply request a new buffer
a pipe has always 2 things, the mentioned writer, with which u write into the pipe (eg reading from a network stream, file stream or simply writing into it)
and then a pipe reader, that guy gets the ReadOnlySequence<byte>
where all these buffer segments from the writer are put together
and from there u read and do what ever u want to do with it (parse packets, send data to a network or file stream, etc)
a sequence writer would be something like
but there is a lot of handling still missing regarding handling the asynchronous state between reader and writer
and that would make the whole thing quite complex, so other approaches are better here
and that totally depends on what happens with the buffer after u have written it