Best way to implement a regex based lexer [Answered]
In a regex lexer, you can loop over every pattern and do something like this:
The problem with this is that, unlike other Regex implementations, if c# regex sees that the first char does not match the pattern, instead of returning null/false, it keeps going until a match is found, meaning the if statement is needed and extra match calls are made.
Is there any way to get this to work without the extra calls? I could use the index of each match and add them into a list in order based off the match's index position, but that seems needlessly complicated if there is a better way
10 Replies
an example of how I'd want regex matching to work is in JS. see here where if it doesn't match the first char it returns null https://gist.github.com/pepasflo/4afa5813606b6ee73526a0d21d0d1035#file-lexer-js. So for example pattern " " (checks for space character) would return null with the string "hello world" in JS , while in c# it would recognize the space between o and w (index 5)
@nekodjin (sorry for the ping)
can you help me with this? You seem to be knowledgeable about c# regex
put
^
in the beginning of your regex pattern to match the beginning of the string, then it will only match the pattern if the pattern occurs at the very beginning. however this will require you to take substrings of the input string.If that's the best solution I'll do it
it's not my favorite though :|
yeah
it's not ideal
however that is what the JS solution does
that you posted
i looked in the API and unfortunately it doesn't seem as if there is such an option
that is unfortunate because taking substrings can be expensive
oh well..
¯\_(ツ)_/¯
it'd be nice if it had that option
but alas
the closest thing is you can specify an index to stop looking
but that's ever-so-slightly different
and doesn't work with arbitrarily long tokens like identifiers
I've heard antlr is good for this
you could try that
yeah if you care about really optimizing stuff there are other options
but for a hobby project you generally don't
and for things more complicated than hobby projects, lexers and parsers are practically never the bottleneck in langdev
ok thanks
✅ This post has been marked as answered!