C
C#ā€¢16mo ago
__dil__

āœ… How to pattern-match a `Rune`?

is this the best I can do?
myRune switch
{
_ when myRune == (Rune)'A' => 3,
_ when myRune == (Rune)'B' => 4,
// ...
}
myRune switch
{
_ when myRune == (Rune)'A' => 3,
_ when myRune == (Rune)'B' => 4,
// ...
}
That's very noisy visually and it's unclear to me if the compiler is smart enough to perform the (Rune) cast on char literals at compile-time. Is there not a way to specify a constant/literal Rune value?
29 Replies
Thinker
Thinkerā€¢16mo ago
Rune is firstly very old I think and secondly has no kind of compiler/runtime support, so no it has no constant representation
__dil__
__dil__OPā€¢16mo ago
One one think that since it's the way to represent valid utf-16 scalars and that it is old (according to you, I have no idea), that it would have decent language support, no? šŸ¤” do c# devs just not care about supporting basic utf-16?
Thinker
Thinkerā€¢16mo ago
char is UTF-16
__dil__
__dil__OPā€¢16mo ago
utf-16 scalars can be made of one or two chars which is what rune represents is that not correct?
Jimmacle
Jimmacleā€¢16mo ago
at least for me, my tasks that involve parsing just don't have to handle non-ascii characters
__dil__
__dil__OPā€¢16mo ago
that's fair, but in a world of international communication things tend to be more complex than that in general
Jimmacle
Jimmacleā€¢16mo ago
is there something specific you're trying to achieve?
__dil__
__dil__OPā€¢16mo ago
I think that's stated in the original question, let me know if I can clarify šŸ™‚
Jimmacle
Jimmacleā€¢16mo ago
i mean the overall goal
__dil__
__dil__OPā€¢16mo ago
I need to parse strings that may contain any valid utf-16 scalar
Jimmacle
Jimmacleā€¢16mo ago
do you need to specifically handle characters that may be 2 code units? or is the fact that they may exist in the input irrelevant to the actual parsing logic
reflectronic
reflectronicā€¢16mo ago
here is something you can do:
internal static class RuneExtensions
{
public static void Deconstruct(this Rune rune, out char? character)
{
if (rune.Utf16SequenceLength == 1)
{
// For a non-surrogate, the UTF-16 code unit has the value of the code point
character = (char)rune.Value;
}
else
{
character = null;
}
}
}
internal static class RuneExtensions
{
public static void Deconstruct(this Rune rune, out char? character)
{
if (rune.Utf16SequenceLength == 1)
{
// For a non-surrogate, the UTF-16 code unit has the value of the code point
character = (char)rune.Value;
}
else
{
character = null;
}
}
}
Thinker
Thinkerā€¢16mo ago
yeah was about to suggest that
reflectronic
reflectronicā€¢16mo ago
then you can pattern match as follows:
var m = rune switch
{
Rune('A') => 100,
Rune('N') => 200,
Rune(null) => -1
};
var m = rune switch
{
Rune('A') => 100,
Rune('N') => 200,
Rune(null) => -1
};
i assume you do not need to match on actual surrogate pairs, though you can add another overload of Deconstruct to do that
__dil__
__dil__OPā€¢16mo ago
interesting!
Thinker
Thinkerā€¢16mo ago
didn't know Deconstruct allowed you to use that pattern matching syntax
__dil__
__dil__OPā€¢16mo ago
Bear in mind I come from a place where you don't have to ask this question since char is a whole scalar, not half of it. So the answer is "I don't know because I don't know which characters are part of the extended set or not". Now, I'm aware the pragmatic answer is "if you don't know, then you probably don't need it". Nonetheless, I've been learning a lot by studying these more complex situations. It allows me to see different corners of the languages and the techniques that are used to work with them. reflectronic's answer looks like it covers exactly what I need in this regard
reflectronic
reflectronicā€¢16mo ago
there are not many commonly-used symbols that are not in the Basic Multilingual Plane (the set of characters which can be encoded with one UTF-16 code unit) so most people in C# will cook up the most cursed string processing algorithms in existence, largely because the APIs are not very good, and nobody will notice
__dil__
__dil__OPā€¢16mo ago
yeah that makes sense
reflectronic
reflectronicā€¢16mo ago
it is mostly historical scripts outside of the BMP. there are some mathematical symbols which are probably used more often than those. there are also many emoji šŸ˜‚ is probably the most common non-BMP character
__dil__
__dil__OPā€¢16mo ago
That is good to know! Definitely niche in the context of parsing, but good to know nonetheless. In the context of general string manipulation this is paramount though since you need to be careful not to break up surrogate pairs. @reflectronic if you don't mind me asking, how long have you been learning C#, and do you work professionally with it?
reflectronic
reflectronicā€¢16mo ago
i'm not a professional, just a college sophomore :) as for how long, it's hard to say, it's been at least three or four years, before that i was sort of on-and-off with programming
__dil__
__dil__OPā€¢16mo ago
You've been answering a bunch of my more obscure questions, so I figure you're pretty familiar with the language. I was curious how long it might take to develop said familiarity.
reflectronic
reflectronicā€¢16mo ago
one could say that, uh, my life skill tree is not very balanced
__dil__
__dil__OPā€¢16mo ago
wdym? lots of CS?
reflectronic
reflectronicā€¢16mo ago
many people have been using C# for far longer than me, but are probably more familiar with very different, more practical, things about the language
__dil__
__dil__OPā€¢16mo ago
ah I see. Well, I can relate to you in a sense. I'm not a dev by trade (actually graduated in physics), so programming is more like a fun puzzle/hobby for me. Recently I've been thinking about becoming a dev though. which is part of why I'm learning C#. Not a ton of Rust jobs in my area.
reflectronic
reflectronicā€¢16mo ago
i like to understand how things work and read about the reasons why things are the way they are, it is very fun for me, and there is certainly a great depth of knowledge that comes from that. i am not sure how useful it is, or if other people like to do the same thing
Accord
Accordā€¢16mo ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.

Did you find this page helpful?