C
C#23h ago
Big Chungus

Fast regex that takes in a span instead of a string

I'm trying to port a lexer I wrote in c++ to c# and want to get the same functionality as RE2's consume method, however it seems like all the c# wrappers for RE2 don't even support using ReadOnlySpan<char> instead of strings as input, which seems like it would be ludicrously slow when ran repeatedly compared to using a ReadOnlySpan<char> (or equivalent to c++'s string_view). Is there a way around this or am I forced to suffer through slow code?
25 Replies
jcotton42
jcotton4222h ago
Why would parsing from a string be slower than ROSpan? @Big Chungus What’s RE2 anyhow?
Big Chungus
Big ChungusOP22h ago
because if I want to then do another search on a smaller part of the string (ie if I consume a token from the front), then I have to pass a whole new string google's regex library, it's quite fast
Asher
Asher18h ago
you could in theory import it from a DLL (i don't know if RE2 is a DLL of functions, enlighten me on that one) and convert your span to whatever it uses if it is a DLL i'd assume the function signature is somewhere to be found in the docs if you compile this to a DLL what i said should be possible but keep in mind it might be platform specific tho i doubt it, it sounds more complicated than what i make it out to be, this is the signature i think youre looking for though: https://github.com/google/re2/blob/6dcd83d60f7944926bfd308cc13979fc53dd69ca/re2/re2.cc#L423
sibber
sibber9h ago
did you benchmark? im curious compiled .net regex is quite fast and theres a non backgracking engine too
Big Chungus
Big ChungusOP8h ago
I don't have any benchmarks against .net, but it tends to do well on most benchmarks (eg: https://lh3lh3.users.sourceforge.net/reb.shtml)
sibber
sibber8h ago
time to benchmark then https://github.com/Treit/MiscBenchmarks/tree/main/Regex you can start with this one
Big Chungus
Big ChungusOP8h ago
I don't need performance so badly that I'm going to benchmark it I'm also loosing a significant amount of performance to poor string handling anyways
sibber
sibber8h ago
i mean you seem to need it so badly that youre using an unmaintained library for a third party regex engine
Big Chungus
Big ChungusOP8h ago
thats more just bc I wanted to use something fast
sibber
sibber8h ago
you can just use that benchmark it doesnt take much time but you do you
Big Chungus
Big ChungusOP8h ago
not because I needed it
sibber
sibber8h ago
use .net's regex then :when:
Big Chungus
Big ChungusOP8h ago
I am now lol still a shame that no one has made a propper wrapper for re2 tho it's a really well made library
sibber
sibber8h ago
because .net regex is very fast, theres no reason to use re2 for like 99% of use cases
Big Chungus
Big ChungusOP8h ago
wait why is there isMatch() for ROS, but not Match()? I really hate C#'s regex
sibber
sibber8h ago
thats a bit weird but you can pass a beginning and a lenght, it creates a span internally not sure why they havent added a public api for it yet
Big Chungus
Big ChungusOP5h ago
yes, but in my testing, it doesn't consider this the beginning of the string, so I actually have to create a whole new string
sibber
sibber5h ago
wdym, it literally creates a span starting at the start index you pass
Big Chungus
Big ChungusOP5h ago
yes, but if you have the string "test string" and you tell it to start from index 4, patterns like ^\s will not be matched
MODiX
MODiX5h ago
sibber
REPL Result: Success
Regex.Match("test string", @"^\s").Value.Length
Regex.Match("test string", @"^\s").Value.Length
Result: int
0
0
Compile: 225.606ms | Execution: 19.219ms | React with ❌ to remove this embed.
sibber
sibber5h ago
oh oops i forgot to set the start lmao
MODiX
MODiX5h ago
sibber
REPL Result: Success
new Regex(@"^\s").Match("test string", 4, 7).Value
new Regex(@"^\s").Match("test string", 4, 7).Value
Result: string




Compile: 251.285ms | Execution: 17.394ms | React with ❌ to remove this embed.
MODiX
MODiX5h ago
sibber
REPL Result: Success
new Regex(@"^\s").Match("test string", 4, 7).Value.Length
new Regex(@"^\s").Match("test string", 4, 7).Value.Length
Result: int
1
1
Compile: 264.584ms | Execution: 18.245ms | React with ❌ to remove this embed.
sibber
sibber5h ago
it does match
Big Chungus
Big ChungusOP4h ago
odd

Did you find this page helpful?