Fast regex that takes in a span instead of a string
I'm trying to port a lexer I wrote in c++ to c# and want to get the same functionality as RE2's consume method, however it seems like all the c# wrappers for RE2 don't even support using
ReadOnlySpan<char>
instead of strings as input, which seems like it would be ludicrously slow when ran repeatedly compared to using a ReadOnlySpan<char>
(or equivalent to c++'s string_view
). Is there a way around this or am I forced to suffer through slow code?25 Replies
Why would parsing from a string be slower than ROSpan? @Big Chungus
What’s RE2 anyhow?
because if I want to then do another search on a smaller part of the string (ie if I consume a token from the front), then I have to pass a whole new string
google's regex library, it's quite fast
you could in theory import it from a DLL (i don't know if RE2 is a DLL of functions, enlighten me on that one) and convert your span to whatever it uses
if it is a DLL i'd assume the function signature is somewhere to be found in the docs
if you compile this to a DLL what i said should be possible but keep in mind it might be platform specific tho i doubt it, it sounds more complicated than what i make it out to be, this is the signature i think youre looking for though:
https://github.com/google/re2/blob/6dcd83d60f7944926bfd308cc13979fc53dd69ca/re2/re2.cc#L423
did you benchmark?
im curious
compiled .net regex is quite fast
and theres a non backgracking engine too
I don't have any benchmarks against .net, but it tends to do well on most benchmarks (eg: https://lh3lh3.users.sourceforge.net/reb.shtml)
time to benchmark then
https://github.com/Treit/MiscBenchmarks/tree/main/Regex
you can start with this one
I don't need performance so badly that I'm going to benchmark it
I'm also loosing a significant amount of performance to poor string handling anyways
i mean you seem to need it so badly that youre using an unmaintained library for a third party regex engine
thats more just bc I wanted to use something fast
you can just use that benchmark it doesnt take much time
but you do you
not because I needed it
use .net's regex then :when:
I am now lol
still a shame that no one has made a propper wrapper for re2 tho
it's a really well made library
because .net regex is very fast, theres no reason to use re2
for like 99% of use cases
wait why is there isMatch() for ROS, but not Match()?
I really hate C#'s regex
thats a bit weird
but you can pass a beginning and a lenght, it creates a span internally
not sure why they havent added a public api for it yet
yes, but in my testing, it doesn't consider this the beginning of the string, so I actually have to create a whole new string
wdym, it literally creates a span starting at the start index you pass
yes, but if you have the string "test string" and you tell it to start from index 4, patterns like
^\s
will not be matchedsibber
REPL Result: Success
Result: int
Compile: 225.606ms | Execution: 19.219ms | React with ❌ to remove this embed.
oh oops i forgot to set the start lmao
sibber
REPL Result: Success
Result: string
Compile: 251.285ms | Execution: 17.394ms | React with ❌ to remove this embed.
sibber
REPL Result: Success
Result: int
Compile: 264.584ms | Execution: 18.245ms | React with ❌ to remove this embed.
it does match
odd