C#•16mo ago

BinarySerialization - need some design help

Hi, this is my first foray into lower level programming and I could use some help. I am writing a DOCSIS-compliant binary serializer for C#. It will take as input a .cfg or .txt file that resembles a JSON, but is a bit different:

Main 
{
    DownstreamFrequency 130000000;
    UpstreamChannelId 123;
    NetworkAccess 1;
    ClassOfService
    {
        ClassID 5;
        MaxRateDown 512000;
        MaxRateUp 64000;
        PriorityUp 3;
        GuaranteedUp 32000;
        MaxBurstUp 54314;
        PrivacyEnable 1;
    }
    MaxCPE 13;
    SwUpgradeServer 10.1.1.1;
    /* CmMic e241803a16fa62269f90d6e1619a59d3; */
    /* CmtsMic 41141948116bcc38f6a20ec485fcd0f2; */
    /*EndOfDataMkr*/
}

Main 
{
    DownstreamFrequency 130000000;
    UpstreamChannelId 123;
    NetworkAccess 1;
    ClassOfService
    {
        ClassID 5;
        MaxRateDown 512000;
        MaxRateUp 64000;
        PriorityUp 3;
        GuaranteedUp 32000;
        MaxBurstUp 54314;
        PrivacyEnable 1;
    }
    MaxCPE 13;
    SwUpgradeServer 10.1.1.1;
    /* CmMic e241803a16fa62269f90d6e1619a59d3; */
    /* CmtsMic 41141948116bcc38f6a20ec485fcd0f2; */
    /*EndOfDataMkr*/
}

This is a barebones, minified example of what these files will contain. Every line is either a property or a collection of properties. I will refer to these properties as DocsisProperty<T>, since they will be deserialized into a C# type and then encoded into byte[] later on. Now the issue I am facing is that there are 25 ish encoding methods and 25 decoding methods, but hundreds of different properties. All unique properties are stored in a property map with name, id and a reference to the correct method for encoding or decoding. This can be seen here: https://github.com/rlaager/docsis/blob/master/src/docsis_symtable.h My original idea was that when I am parsing a file, I can initialize a new DocsisProperty<T> based on the identifier of the value, then later on when I want to encode it I call docsisProperty.Encode(), and it will look up some static dictionary to figure out which encoding method is the appropriate one. However that gets complicated quickly because of the generic type on DocsisProperty<T>. The dictionary would then have to use object or dynamic as it's return type, and I am not a fan of either approach.

30 Replies

EsaOP•16mo ago

character limit^ The other solution could be to remove the generic type, and instead store the value as the raw string that was read from the input file. Then the en-/decoding functions would have to first parse the string then encode the parsed result, which kind of would give them an additional responsibility (string deserializing) in addition to binary serialization. So what I was thinking was something like this, but I don't like introducing dynamic into any code, so it feels very wrong lol

Dictionary<string, Func<byte[], string, dynamic>> DecodingDict = new() 
{
  { "IpAddress", (bytes, propertyName) => DocsisDecoder.DecodeToUInt24(bytes, propertyName) }
}

Dictionary<string, Func<byte[], string, dynamic>> DecodingDict = new() 
{
  { "IpAddress", (bytes, propertyName) => DocsisDecoder.DecodeToUInt24(bytes, propertyName) }
}

cap5lut•16mo ago

instead of removing the generic, u could simply add an additional non-generic interface i think

public interface DocsisProperty {}
public interface DocsisProperty<T> : DocsisProperty {}

public interface DocsisProperty {}
public interface DocsisProperty<T> : DocsisProperty {}

EsaOP•16mo ago

Oh interesting. Allright. Can you tell me about what that would do?

cap5lut•16mo ago

well, u could then use DocsisProperty in the dictionary. but from the outside u could still have the "concrete generic version" the DocsisDecoder.Decode(...) methods would still return the DocsisProperty<T> and as u know the correct type u could simply cast it back:

Dictionary<string, Func<byte[], string, DocsisProperty>> DecodingDict = new();

public DocsisProperty<T> Decode<T>(byte[] data, string name)
{
  DocsisProperty decoded = DecodingDict[name].Invoke(data, name);
  return (DocsisProperty<T>)(object)decoded; // this doesnt actually box
}

Dictionary<string, Func<byte[], string, DocsisProperty>> DecodingDict = new();

public DocsisProperty<T> Decode<T>(byte[] data, string name)
{
  DocsisProperty decoded = DecodingDict[name].Invoke(data, name);
  return (DocsisProperty<T>)(object)decoded; // this doesnt actually box
}

i would also not use byte[] for that but something like like ROS<byte>

public delegate DocsisProperty Decoder(ReadOnlySpan<byte> buffer, string propertyName, out int readBytes);
// or
public delegate DocsisProperty Decoder(ref ReadOnlySpan<byte> buffer, string propertName);

public delegate DocsisProperty Decoder(ReadOnlySpan<byte> buffer, string propertyName, out int readBytes);
// or
public delegate DocsisProperty Decoder(ref ReadOnlySpan<byte> buffer, string propertName);

where the decoder method would look then something like

public static DocsisProperty<uint> DecodeToUInt32(ReadOnlySpan<byte> buffer, string propertyName, out int readBytes)
{
  uint value = ...;
  readBytes = 4;
  return new DocsisProperty<uint>(value);
}
// or
public static DocsisProperty<uint> DecodeToUInt32(ref ReadOnlySpan<byte> buffer, string propertyName)
{
  uint value = ...;
  buffer = buffer[4..];
  return new DocsisProperty<uint>(value);
}

public static DocsisProperty<uint> DecodeToUInt32(ReadOnlySpan<byte> buffer, string propertyName, out int readBytes)
{
  uint value = ...;
  readBytes = 4;
  return new DocsisProperty<uint>(value);
}
// or
public static DocsisProperty<uint> DecodeToUInt32(ref ReadOnlySpan<byte> buffer, string propertyName)
{
  uint value = ...;
  buffer = buffer[4..];
  return new DocsisProperty<uint>(value);
}

the ref or out generally to keep track of how many bytes are already consumed, because then u could also write a decode method for variable length stuff like ClassOfService and generally the ReadOnlySpan<byte> (or ReadOnlyMemory<byte>) so u dont have copy data

EsaOP•16mo ago

Generally the actual encoding / decoding methods are already implemented in C, so I'm just translating them to C#. How come you'd prefer ROS over byte[]? I don't know enough about this to have an opinion, but I'll start reading up on ROS if you think its a good idea

cap5lut•16mo ago

Span and ReadOnlySpan (and the ..Memory) structs are simply over something else's memory. for example an array (or some native memory) with the example decoding methods u could basically do

byte[] data = GetDataFromSomewhere();
ReadOnlySpan<byte> buffer = data;
var a = DecodeToUInt32(ref buffer, "a");
var b = DecodeToUint32(ref buffer, "b");
// and so on

byte[] data = GetDataFromSomewhere();
ReadOnlySpan<byte> buffer = data;
var a = DecodeToUInt32(ref buffer, "a");
var b = DecodeToUint32(ref buffer, "b");
// and so on

slicing memory/spans like buffer = buffer[4..]; doesnt make an actual copy of the data because they simply point to some start element and contain a length they reflect continuous memory depending on how big that data is in general/how u read it, ReadOnlySequence<byte> might be better, because this thingy can be "spanned" over multiple buffers

EsaOP•16mo ago

Ah, allright. Whereas an array would dedicate actual memory to it and so on, ROS will be better, is that it?

cap5lut•16mo ago

(eg if u read via System.IO.Pipelines from some network stream, or lets say a 5GB binary file) if u would slice a array directly, the resulting slice, wouldnt be the same memory as the original array. it would be a copy of that part of the array

EsaOP•16mo ago

Gotcha, okay that makes a lot of sense. Then that means byte[] is actually rather inefficient But a ROS cannot be a return type, right? it'd have to be a ref arg

cap5lut•16mo ago

it can be a return type, it can just not be a member of any type (besides other ref structs) and are somewhat limited in async code, but thats what u can use ROM for

EsaOP•16mo ago

Ah gotcha, I just cannot use it as an out param

cap5lut•16mo ago

here is a little example how i do reading on ReadOnlySequence<byte>, where i internally use Span<byte>/ReadOnlySpan<byte> for actual reading:

    public struct SequenceReader(ReadOnlySequence<byte> buffer)
    {
        private ReadOnlySequence<byte> _buffer = buffer;
        private int _read = 0;

        public int BytesRead => _read;

        private void Seek(int offset)
        {
            _buffer = _buffer.Slice(offset);
            _read += offset;
        }

        private void Seek<T>() where T : unmanaged => Seek(Unsafe.SizeOf<T>());

        private static void GetBuffer(ReadOnlySequence<byte> source, out ReadOnlySpan<byte> dest, Span<byte> temp)
        {
            if (source.FirstSpan.Length >= temp.Length)
            {
                dest = source.FirstSpan[..temp.Length];
            }
            else
            {
                source.Slice(0, temp.Length).CopyTo(temp);
                dest = temp;
            }
        }

        public short ReadInt16()
        { 
            GetBuffer(_buffer, out var buffer, stackalloc byte[sizeof(short)]);
            short value = BinaryPrimitives.ReadInt16BigEndian(buffer);
            Seek<short>();
            return value;
        }

    public struct SequenceReader(ReadOnlySequence<byte> buffer)
    {
        private ReadOnlySequence<byte> _buffer = buffer;
        private int _read = 0;

        public int BytesRead => _read;

        private void Seek(int offset)
        {
            _buffer = _buffer.Slice(offset);
            _read += offset;
        }

        private void Seek<T>() where T : unmanaged => Seek(Unsafe.SizeOf<T>());

        private static void GetBuffer(ReadOnlySequence<byte> source, out ReadOnlySpan<byte> dest, Span<byte> temp)
        {
            if (source.FirstSpan.Length >= temp.Length)
            {
                dest = source.FirstSpan[..temp.Length];
            }
            else
            {
                source.Slice(0, temp.Length).CopyTo(temp);
                dest = temp;
            }
        }

        public short ReadInt16()
        { 
            GetBuffer(_buffer, out var buffer, stackalloc byte[sizeof(short)]);
            short value = BinaryPrimitives.ReadInt16BigEndian(buffer);
            Seek<short>();
            return value;
        }

and the span reader:

    public ref struct SpanReader(ReadOnlySpan<byte> buffer)
    {
        private ReadOnlySpan<byte> _buffer = buffer;
        private int _read = 0;

        public int BytesRead => _read;

        private void Seek(int offset)
        {
            _buffer = _buffer[offset..];
            _read += offset;
        }

        private void Seek<T>() where T : unmanaged => Seek(Unsafe.SizeOf<T>());

        public short ReadInt16()
        {
            short value = BinaryPrimitives.ReadInt16BigEndian(_buffer);
            Seek<short>();
            return value;
        }

    public ref struct SpanReader(ReadOnlySpan<byte> buffer)
    {
        private ReadOnlySpan<byte> _buffer = buffer;
        private int _read = 0;

        public int BytesRead => _read;

        private void Seek(int offset)
        {
            _buffer = _buffer[offset..];
            _read += offset;
        }

        private void Seek<T>() where T : unmanaged => Seek(Unsafe.SizeOf<T>());

        public short ReadInt16()
        {
            short value = BinaryPrimitives.ReadInt16BigEndian(_buffer);
            Seek<short>();
            return value;
        }

EsaOP•16mo ago

Oh I wasn't aware of BinaryPrimitives! Makes sense something like that'd exist it's just missing UInt24, but I can implement that myself

cap5lut•16mo ago

isnt that hard either ;p

        public ulong ReadUInt40()
        {
            Span<byte> temp = stackalloc byte[sizeof(ulong)];
            _buffer.Slice(0, 5).CopyTo(temp[3..]);
            Seek(5);
            return BinaryPrimitives.ReadUInt64BigEndian(temp);
        }

        public ulong ReadUInt40()
        {
            Span<byte> temp = stackalloc byte[sizeof(ulong)];
            _buffer.Slice(0, 5).CopyTo(temp[3..]);
            Seek(5);
            return BinaryPrimitives.ReadUInt64BigEndian(temp);
        }

EsaOP•16mo ago

Okay so it this better than operating directly on a byte array? Then you'd just achieve what you wanted with bitwise operations. Remember I will be encoding to memory that hasn't been written yet, as in the buffer array doesn't exist before I call my methods

cap5lut•16mo ago

basically u would just create a 4 byte buffer, copy the 3 bytes at the appropriate position, and read the uint from that position

EsaOP•16mo ago

Okay. And using stackalloc, ROS etc is better suited for that purpose than byte arrays? Remember I ask cause I don't know 🙂

cap5lut•16mo ago

stackalloc is only good for the method u r in, after u leave the method, that memory is gone again. but why give the gc something to do, for a small 8 byte buffer u need for less than one millisecond

EsaOP•16mo ago

Good point!

cap5lut•16mo ago

ROS/ROM are abstractions over arbitrary memory, so either on the stack, managed or unmanaged heap, its mainly to reduce the number of copying u would need to do writing to memory is quite similar

    public ref struct SpanWriter(Span<byte> buffer)
    {
        private Span<byte> _buffer = buffer;
        private int _written = 0;

        public int BytesWritten => _written;

        private void Seek(int offset)
        {
            _buffer = _buffer[offset..];
            _written += offset;
        }

        private void Seek<T>() where T : unmanaged => Seek(Unsafe.SizeOf<T>());

        public void WriteInt16(short value)
        {
            BinaryPrimitives.WriteInt16BigEndian(_buffer, value);
            Seek<short>();
        }

    public ref struct SpanWriter(Span<byte> buffer)
    {
        private Span<byte> _buffer = buffer;
        private int _written = 0;

        public int BytesWritten => _written;

        private void Seek(int offset)
        {
            _buffer = _buffer[offset..];
            _written += offset;
        }

        private void Seek<T>() where T : unmanaged => Seek(Unsafe.SizeOf<T>());

        public void WriteInt16(short value)
        {
            BinaryPrimitives.WriteInt16BigEndian(_buffer, value);
            Seek<short>();
        }

but u have to pre-allocate sufficient memory but for writting u would need to pre allocate ur byte array as well ;p

EsaOP•16mo ago

Okay, excellent. I'll try and implement this. UInt24 and such is quite easily done, but this domain I believe gets a bit involved in some usecases haha. You ok if I poke you later on with some specific usecase if I got questions?

cap5lut•16mo ago

sure im just procrastinating writing this god damned source generator anyway xD

EsaOP•16mo ago

😄 I'm so out of my depth here haha, I'm just used to high level stuff

cap5lut•16mo ago

(these readers/writers ive shown will be the building blocks for the reader/writer generation for the packets - im writing some network code right now, which has a reaaaally ugly protocol 😂 ) the sg will generate readers and writers for example for this packet:

public readonly record struct AuthenticationPacket(int UnknownInt, string UserName, string EncodedToken) : IClientPacket
{
    public static ClientPacketType PacketType => ClientPacketType.Authenticate;
}

public readonly record struct AuthenticationPacket(int UnknownInt, string UserName, string EncodedToken) : IClientPacket
{
    public static ClientPacketType PacketType => ClientPacketType.Authenticate;
}

EsaOP•16mo ago

readonly record struct, that's a new one for me

cap5lut•16mo ago

😄

EsaOP•16mo ago

so records are readonly by default, but structs aren't as it's a value type, but records can be reference types with value semantics.. i'm confused xD

cap5lut•16mo ago

well, in the end no matter if record class or record struct, the properties are only getters (and the fields are readonly iirc as well). but for record structs, that still would just produce a struct and not a readonly struct basically ill write some quick and dirty boxing code to make it run, and then i gonna fine tune it later use the structs non-boxed, so i save some allocations where possible

EsaOP•16mo ago

Ah gotcha Did you have a SequenceWriter also?

cap5lut•16mo ago

dont need one, so nope 😂 the thing is, i use System.IO.Pipelines, there u allocate the buffer like Memory<byte> buffer = _pipe.Writer.GetMemory(minBufferSize);, so u dont even have a writable Sequence<byte> or similar basically, after writing to that memory buffer, u call _pipe.Writer.Advance(bytesWritten), for the next stuff u write, u simply request a new buffer a pipe has always 2 things, the mentioned writer, with which u write into the pipe (eg reading from a network stream, file stream or simply writing into it) and then a pipe reader, that guy gets the ReadOnlySequence<byte> where all these buffer segments from the writer are put together and from there u read and do what ever u want to do with it (parse packets, send data to a network or file stream, etc) a sequence writer would be something like

public class SequenceWriter(PipeWriter writer, int initialBufferSize = 512)
{
  private readonly PipeWriter _writer = writer;
  private Memory<byte> _buffer = writer.GetMemory(initialBufferSize);

  private void EnsureCapacity(int capacity)
  {
    if (_buffer.Length < capacity)
    {
      _buffer = _writer.GetMemory(capacity);
    }
  }

  public async ValueTask WriteInt32(int value)
  {
    EnsureCapacity(sizeof(int));
    BinaryPrimitives.WriteInt32BigEndian(_buffer.Span, value);
    _writer.Advance(sizeof(int));
    _buffer = _buffer[sizeof(int)..];
    await _writer.FlushAsync();
  }
}

public class SequenceWriter(PipeWriter writer, int initialBufferSize = 512)
{
  private readonly PipeWriter _writer = writer;
  private Memory<byte> _buffer = writer.GetMemory(initialBufferSize);

  private void EnsureCapacity(int capacity)
  {
    if (_buffer.Length < capacity)
    {
      _buffer = _writer.GetMemory(capacity);
    }
  }

  public async ValueTask WriteInt32(int value)
  {
    EnsureCapacity(sizeof(int));
    BinaryPrimitives.WriteInt32BigEndian(_buffer.Span, value);
    _writer.Advance(sizeof(int));
    _buffer = _buffer[sizeof(int)..];
    await _writer.FlushAsync();
  }
}

but there is a lot of handling still missing regarding handling the asynchronous state between reader and writer and that would make the whole thing quite complex, so other approaches are better here and that totally depends on what happens with the buffer after u have written it

Gaming

Programming

BinarySerialization - need some design help

Did you find this page helpful?