16 Replies
Probably missing the null terminator
do you mean String's init is not adding it?
My guess is that the span is being converted to a list, and that’s being used for the string construction. If that Span’s last element is not a null terminator, then you’d see the clipping that you are now
Try constructing a list from the span, and append a 0 to it. Then create the string from that list
lol ok
I don’t like the null terminator shenanigans for strings either lol
I mean, regardless of that, it should be automated
but yeah...
yep, appending 0 works
gonna dust off the PDP-11
Yeah, automating it would be nice. I’ve shot myself in the foot plenty of times while missing the null terminator
so making a string slice doesn't need the append
@toasty good catch, thanks!
Yeah! I try to use string slice where I can too
a safe version would be really useful
It's only unsafe in the sense that you don't know if the data is valid UTF-8 (and StringSlice is supposed to always be valid UTF-8. The constructor should have a debug_assert(_is_valid_utf8(span)) but there are issues with the function at compile time so it's not active.
So there is no way to "safely" get data from the wire were you're sure it's valid UTF-8 from a generic API perspective
I meant a version that throws
Oh yeah that's totally in the roadmap. Once we have a Bytes type defined we'll have Bytes.decode() (or something of the like). At least it's my hope to be able to implement it and it getting merged 😄
Talking about the roadmap for StringSlice. Will the StringSlice eventually support slicing, i.e. return a range of characters by using the slice syntax like a[2:5]?
Congrats @MM, you just advanced to level 4!
100%, I'm leaving that off to the future because I'm waiting until I can make the switch to full unicode to not break too much code. The current String type is sliced by bytes, but Python's is by unicode codepoints. The current
len(String)
also returns byte_length()
instead of unicode codepoint length, Python's is unicode codepoints. len(StringSlice)
works by unicode codepoints, so adding slicing will confuse many people.
I'm hoping to land this before the next stable release in 2025 but we'll see