C
C#2y ago
Samuel

❔ ✅ Splitting string without allocation

I'm writing a JsonConverter for Vector3 and I'd like to do this without allocating too much memory. I've written this extension method to split the string as ReadOnlyMemory. I wanted to use ReadOnlySpan but I'm unable to use this type in IEnumerable<T>.
public static IEnumerable<ReadOnlyMemory<T>> Split<T>(this ReadOnlyMemory<T> memory, T separator)
where T: IEquatable<T>
{
int start = 0;
int end = 0;
while (end < memory.Length)
{
if (memory.Span[end].Equals(separator) || end == memory.Length)
{
yield return memory.Slice(start, end - start);
start = end + 1;
}
end++;
}

}
public static IEnumerable<ReadOnlyMemory<T>> Split<T>(this ReadOnlyMemory<T> memory, T separator)
where T: IEquatable<T>
{
int start = 0;
int end = 0;
while (end < memory.Length)
{
if (memory.Span[end].Equals(separator) || end == memory.Length)
{
yield return memory.Slice(start, end - start);
start = end + 1;
}
end++;
}

}
This is my implementation of Read for JsonConverter<Vector3>
public override Vector3 Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
{
ReadOnlyMemory<char> vectorString = reader.GetString().AsMemory();
var vectorValues = vectorString.Split(',').Select(s => float.Parse(s)).ToArray(); // Unable to parse `s` as it's not a string. What do?
return new Vector3 { X = vectorValues[0], Y = vectorValues[1], Z = vectorValues[2] };
}
public override Vector3 Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
{
ReadOnlyMemory<char> vectorString = reader.GetString().AsMemory();
var vectorValues = vectorString.Split(',').Select(s => float.Parse(s)).ToArray(); // Unable to parse `s` as it's not a string. What do?
return new Vector3 { X = vectorValues[0], Y = vectorValues[1], Z = vectorValues[2] };
}
I'd like to know if there are any ways to make my Split method better, and to somehow get my Read method to work. Thanks for reading
11 Replies
FusedQyou
FusedQyou2y ago
Can't you just use the build-in Split?
ReadOnlySpan<char> separator = { ', ' }.AsSpan();
ReadOnlyMemory<char>[] substrings = str.Split(separator);
ReadOnlySpan<char> separator = { ', ' }.AsSpan();
ReadOnlyMemory<char>[] substrings = str.Split(separator);
Samuel
SamuelOP2y ago
I think the version of C# I'm using doesn't allow you to split a string with a span
Anton
Anton2y ago
since IEnumerable of span doesn't exist you can't do it with spans I've heard ReadOnlyMemory is significantly slower than span, because it has to check the type of object stored if you're satisfied with IEnumerator or just a for loop, make an iterator make Current return a span
ero
ero2y ago
Also the use of Linq and IEnumerable makes me think you don't truly care about performance and allocations
Anton
Anton2y ago
GitHub
GitHub - cathei/LinqGen: Alloc-free and fast replacement for Linq, ...
Alloc-free and fast replacement for Linq, with code generation - GitHub - cathei/LinqGen: Alloc-free and fast replacement for Linq, with code generation
MODiX
MODiX2y ago
Ero#1111
REPL Result: Failure
string vecStr = "1.23,4.56,7.89";
var vecSp = vecStr.AsSpan();

int i;
float x = float.Parse(vecSp[..(i = vecSp.IndexOf(','))]);
float y = float.Parse(vecSp[++i..(i += vecSp[i..].IndexOf(','))]);
float z = float.Parse(vecSp[(i + 1)..]);

Console.WriteLine(x);
Console.WriteLine(y);
Console.WriteLine(z);
string vecStr = "1.23,4.56,7.89";
var vecSp = vecStr.AsSpan();

int i;
float x = float.Parse(vecSp[..(i = vecSp.IndexOf(','))]);
float y = float.Parse(vecSp[++i..(i += vecSp[i..].IndexOf(','))]);
float z = float.Parse(vecSp[(i + 1)..]);

Console.WriteLine(x);
Console.WriteLine(y);
Console.WriteLine(z);
Exception: CompilationErrorException
- Field or auto-implemented property cannot be of type 'ReadOnlySpan<char>' unless it is an instance member of a ref struct.
- Field or auto-implemented property cannot be of type 'ReadOnlySpan<char>' unless it is an instance member of a ref struct.
Compile: 617.981ms | Execution: 0.000ms | React with ❌ to remove this embed.
ero
ero2y ago
okay dude
ero
ero2y ago
Samuel
SamuelOP2y ago
That's awesome. The whole point of this exercise is to learn more about how to use Span and Memory and associated types, I guess I need to keep practicing. The next issue I have is supporting this functionality in a .NET Framework project. I've tried changing the LangVersion to 8.0 to support range and indexes but no luck. I also cannot parse float from a span. Since this is not related to the problem, I'll just end this here. Thanks
ero
ero2y ago
this is faster without AI everywhere for me ? what's confusing ?? why would you remove that i'm sure they would have benefited greatly from that code completely braindead
public static class StringSplitExtensions
{
public static Enumerable Split(ReadOnlySpan<char> str, ReadOnlySpan<char> split)
{
return new Enumerable(str, split);
}

public ref struct Enumerable
{
private readonly ReadOnlySpan<char> _span;
private readonly ReadOnlySpan<char> _split;

public Enumerable(ReadOnlySpan<char> span, ReadOnlySpan<char> split)
{
_span = span;
_split = split;
}

public Enumerator GetEnumerator() => new Enumerator(_span, _split);
}

public ref struct Enumerator
{
private readonly ReadOnlySpan<char> _span;
private readonly ReadOnlySpan<char> _split;
private Range? _current;

[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal Enumerator(ReadOnlySpan<char> span, ReadOnlySpan<char> split)
{
_span = span;
_split = split;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public bool MoveNext()
{
var startIndex = _current is null
? 0 // No split has been done yet
: _current.Value.End.Value + _split.Length; // Past the end of the split

if (startIndex > _span.Length)
return false; // No more
var foundIndex = _span[startIndex..].IndexOf(_split);
foundIndex = foundIndex is -1 ? _span.Length : foundIndex + startIndex;
_current = startIndex..foundIndex;
return true;
}

public ReadOnlySpan<char> Current
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => _span[_current!.Value];
}
}
}
public static class StringSplitExtensions
{
public static Enumerable Split(ReadOnlySpan<char> str, ReadOnlySpan<char> split)
{
return new Enumerable(str, split);
}

public ref struct Enumerable
{
private readonly ReadOnlySpan<char> _span;
private readonly ReadOnlySpan<char> _split;

public Enumerable(ReadOnlySpan<char> span, ReadOnlySpan<char> split)
{
_span = span;
_split = split;
}

public Enumerator GetEnumerator() => new Enumerator(_span, _split);
}

public ref struct Enumerator
{
private readonly ReadOnlySpan<char> _span;
private readonly ReadOnlySpan<char> _split;
private Range? _current;

[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal Enumerator(ReadOnlySpan<char> span, ReadOnlySpan<char> split)
{
_span = span;
_split = split;
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public bool MoveNext()
{
var startIndex = _current is null
? 0 // No split has been done yet
: _current.Value.End.Value + _split.Length; // Past the end of the split

if (startIndex > _span.Length)
return false; // No more
var foundIndex = _span[startIndex..].IndexOf(_split);
foundIndex = foundIndex is -1 ? _span.Length : foundIndex + startIndex;
_current = startIndex..foundIndex;
return true;
}

public ReadOnlySpan<char> Current
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => _span[_current!.Value];
}
}
}
here you go @Samuel_. @Diddy wrote this one. this is a ref struct enumerator with 0 allocations it works great and is probably really fast for what you want you could benchmark this with and without AI it's faster without for me since the runtime probably knows best what to and what not to inline
public static class StringSplitExtensions
{
public static SplitSpanEnumerable SplitFast(this ReadOnlySpan<char> value, ReadOnlySpan<char> split)
{
return new(value, split);
}
}

public readonly ref struct SplitSpanEnumerable
{
private readonly ReadOnlySpan<char> _span;
private readonly ReadOnlySpan<char> _split;

public SplitSpanEnumerable(ReadOnlySpan<char> span, ReadOnlySpan<char> split)
{
_span = span;
_split = split;
}

public SplitSpanEnumerator GetEnumerator() => new(_span, _split);
}

public ref struct SplitSpanEnumerator
{
private readonly ReadOnlySpan<char> _span;
private readonly ReadOnlySpan<char> _split;

private int _end;
private Range _range;

internal SplitSpanEnumerator(ReadOnlySpan<char> span, ReadOnlySpan<char> split)
{
_span = span;
_split = split;
}

public ReadOnlySpan<char> Current => _span[_range];

public bool MoveNext()
{
int start = _end;
if (start >= _span.Length)
{
return false;
}

int end = _span.IndexOf(_split);
if (end == -1)
{
end = _span.Length;
}
else
{
end += start;
}

_end = end + _split.Length;
_range = start..end;

return true;
}
}
public static class StringSplitExtensions
{
public static SplitSpanEnumerable SplitFast(this ReadOnlySpan<char> value, ReadOnlySpan<char> split)
{
return new(value, split);
}
}

public readonly ref struct SplitSpanEnumerable
{
private readonly ReadOnlySpan<char> _span;
private readonly ReadOnlySpan<char> _split;

public SplitSpanEnumerable(ReadOnlySpan<char> span, ReadOnlySpan<char> split)
{
_span = span;
_split = split;
}

public SplitSpanEnumerator GetEnumerator() => new(_span, _split);
}

public ref struct SplitSpanEnumerator
{
private readonly ReadOnlySpan<char> _span;
private readonly ReadOnlySpan<char> _split;

private int _end;
private Range _range;

internal SplitSpanEnumerator(ReadOnlySpan<char> span, ReadOnlySpan<char> split)
{
_span = span;
_split = split;
}

public ReadOnlySpan<char> Current => _span[_range];

public bool MoveNext()
{
int start = _end;
if (start >= _span.Length)
{
return false;
}

int end = _span.IndexOf(_split);
if (end == -1)
{
end = _span.Length;
}
else
{
end += start;
}

_end = end + _split.Length;
_range = start..end;

return true;
}
}
here's the same slightly optimized i made some more versions of the enumerables that take only a single char as the _split parameter, since those are implemented differently for Span<T>.IndexOf
Accord
Accord2y ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.

Did you find this page helpful?