Iterating over Strings containing Unicode characters?
Has anyone figured out how to do this? Maxim's code works perfectly for getting the count of printable characters in a String.
And the
ord
and chr
changes from the nightly branch help with handling conversion of a single character. But I've been unable to iterate through the codepoints of a string, as iterating through the bytes doesn't work for this use case.
It's all fairly new to me, so I'm curious if anyone has already solved this problem ๐4 Replies
I am looking into proper Unicode support for String, but it might take a while and probably will be started as a proposal first.
You can have a pick at the ord implement I submitted in the nightly, It should give you an idea how to write a function to iterate over the runes.
Thanks for responding! Iโve been taking a look through those changes, Iโll post here if I do figure out how to iterate over runes
Hey, here is an implementation for a char iterator:
As you can see on the second example it works for runes but does not work for grapheme clusters. ๐ฆต๐ผ is iterated over as ๐ฆต + the skin tone.
This however
string_iterator("'็ฑณใใ่ต4็ฉๅงใใๆฐๆพใใใใๆ้ขใฝๆๅนณใตใใฎๅๆ
ใใใ็ไธใๅ
56ๅใใกใคใช่ฉฆๅณใญใใคๆชๅ็ใใจๅทๅซ็พ
่ธใใใใ", print_str)
works as you would expect as there are no grapheme clustersThatโs awesome, thank you! Iโll give it a go in a bit
Works well ๐ I was trying to iterate over Strings with unicode characters and ANSI escape sequences. I was able to drop in the logic as a replacement for a for loop over the range of len(src).