C
C#2y ago
Thinker

✅ Regex which matches strings containing things other than specific patterns

This is probably a very weird question (and also isn't strictly related to C#). I need a regex which matches strings which contain anything but a specific set of patterns. For instance, a string should match if it contains anything but substrings which match either <\d{5}> or \s+. I don't even know if this is possible, just wanna ask.
31 Replies
LPeter1997
LPeter19972y ago
So you want to match string literals that don't contain <5 digits> or multiple consecutive spaces?
Thinker
ThinkerOP2y ago
yes Or actually, it's fine with just anything that doesn't match <5 digits>
LPeter1997
LPeter19972y ago
Totally possible, depending on your definition of string literal Allow escapes? Interpolated arbitrary expressions?
Thinker
ThinkerOP2y ago
I've tried many SO answers, and the closest I've come is ^((?!<\d{5}>).)*$ which matches a string which doesn't contain <\d{5}>, however <12345>aaa doesn't match.
LPeter1997
LPeter19972y ago
Why should <12345>aaa match? So all you'd want to exclude is 5 consecutive digits between <>, but even if anything surrounds that, it's fine?
Thinker
ThinkerOP2y ago
So like... idk how to explain it properly It shouldn't match <12345> <45678> However it should match <12345> a<45678> Any string which isn't made up of exclusively <\d{5}> and \s+
LPeter1997
LPeter19972y ago
The proof that such regex must exist is simple, we can construct an appropriate DFA for it
Thinker
ThinkerOP2y ago
And also the actual pattern is <a?:\w+:\d+> and not <\d{5}>, if that's important this is apparently more complex that I initially thought >_<
LPeter1997
LPeter19972y ago
LPeter1997
LPeter19972y ago
This DFA should match your criteria... I think Which means there's an equivalent regex
Thinker
ThinkerOP2y ago
... wow
LPeter1997
LPeter19972y ago
And there's even a construction to convert a DFA into a regex It's a DP algorithm so the resulting regex will be a fkin monster But I'll try
Thinker
ThinkerOP2y ago
okay before you do that, can I throw in another wrench and say the actual pattern (I think) is <a?:[^\W@#:`]{2,32}:\d+>, it's meant to match Discord emoji strings
LPeter1997
LPeter19972y ago
Ah this is not entirely correct, this will need more states Are you trying to enlarge messages in your discord clone that only consists of emojis?
Thinker
ThinkerOP2y ago
nope, it's for Discord's AutoMod in short, I'm a mod in a server, and for April Fools I had an idea to get AutoMod to block any messages which contain text other than emojis (or whitespace) although hmmmmm it'd probably be easier to just write a bot to do this lmao
LPeter1997
LPeter19972y ago
Representing [^\W@#:`]{2,32} in a DFA is an absolute monster lol Does it have to be a regex, no "algorithm" allowed?
Thinker
ThinkerOP2y ago
well AutoMod only allows plain strings or regexes, but if I write a bot I keep online for the entire duration of the day then it could be an algorithm
ero
ero2y ago
I mean this might be a little naive of me, so excuse me if i got something wrong, but... Can't you just do this? <a?:\w+:\d+>.*\w.*<a?:\w+:\d+>
Thinker
ThinkerOP2y ago
also why didn't I ask this in #discord-dev harold
LPeter1997
LPeter19972y ago
Ok, my problem is that the contents of <...> not being just 5 digits complicates it a lot, and I mean a lot for a DFA construction
LPeter1997
LPeter19972y ago
So this would be the DFA for your og. problem fixed
LPeter1997
LPeter19972y ago
Ah that fkin circle, I hate these tablet things So that top row essentially consumes your <...>s The problem with that is any construct {2, 32} is essentially a 32 times multiplier for the number of states required to represent the repeated construct I'm not saying there's no simpler approach, because the DP conversion is hella inefficient with tons of redundancy I'll look up if ruby regexes could do this with negative matches or something
Denis
Denis2y ago
State diagrams | Mermaid
Create diagrams and visualizations using text and code.
Online FlowChart & Diagrams Editor - Mermaid Live Editor
Simplify documentation and avoid heavy tools. Open source Visio Alternative. Commonly used for explaining your code! Mermaid is a simple markdown-like script language for generating charts from text via javascript.
LPeter1997
LPeter19972y ago
Try this @thinker227 ^(?! *(<\d{5}> *)*$).*$ For your orig problem If it works, replace the inner pattern It's dirty negative lookahead that I haven't used in ages
Thinker
ThinkerOP2y ago
yep, that works oh wow, yeah it works with the other pattern as well
LPeter1997
LPeter19972y ago
Really simple trick, essentially ?! means that don't match if the following matches But it doesn't consume any of the input, it's a lookahead pattern
Thinker
ThinkerOP2y ago
Discord uses the Rust flavor of Regex, which doesn't support ree
LPeter1997
LPeter19972y ago
What the f is the Rust flavor I didn't even know Rust had its own flavor
Thinker
ThinkerOP2y ago
idk, this site is linked from the Discord support page for AutoMod https://rustexp.lpil.uk/
Rustexp
A Rust regular expression editor & tester.
Thinker
ThinkerOP2y ago
pasting the pattern ^(?! *(<a?:[^\W@#:`]{2,32}:\d+> *)*$).*$ into it gives a syntax error
Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
^(?! *(<a?:[^\W@#:`]{2,32}:\d+> *)*$).*$
^^^
error: look-around, including look-ahead and look-behind, is not supported
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)
Syntax(
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
^(?! *(<a?:[^\W@#:`]{2,32}:\d+> *)*$).*$
^^^
error: look-around, including look-ahead and look-behind, is not supported
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
)
LPeter1997
LPeter19972y ago
Ah I see, these regexes only support the truly regular subset Meaning that underneath they compile the regexes to a state machine, which a few constructs don't support This doesn't mean that there's no regex for your problem tho, but it needs to be handrolled (Back to the DFA lol) So the inner pattern is <a?:[^\W@#:`]{2,32}:\d+> I'll start the construction with a more limited number, definitely not 32 And see if there's some recurring pattern in the results we can simplify or something Ah I also need to find my old textbook on the construction, so this will likely be a late afternoon thing (in practice you rarely construct regexes from DFAs)

Did you find this page helpful?