β Regex which matches strings containing things other than specific patterns
This is probably a very weird question (and also isn't strictly related to C#). I need a regex which matches strings which contain anything but a specific set of patterns. For instance, a string should match if it contains anything but substrings which match either
<\d{5}>
or \s+
. I don't even know if this is possible, just wanna ask.31 Replies
So you want to match string literals that don't contain
<5 digits>
or multiple consecutive spaces?yes
Or actually, it's fine with just anything that doesn't match
<5 digits>
Totally possible, depending on your definition of string literal
Allow escapes? Interpolated arbitrary expressions?
I've tried many SO answers, and the closest I've come is
^((?!<\d{5}>).)*$
which matches a string which doesn't contain <\d{5}>
, however <12345>aaa
doesn't match.Why should
<12345>aaa
match?
So all you'd want to exclude is 5 consecutive digits between <>, but even if anything surrounds that, it's fine?So like... idk how to explain it properly
It shouldn't match
<12345> <45678>
However it should match <12345> a<45678>
Any string which isn't made up of exclusively <\d{5}>
and \s+
The proof that such regex must exist is simple, we can construct an appropriate DFA for it
And also the actual pattern is
<a?:\w+:\d+>
and not <\d{5}>
, if that's important
this is apparently more complex that I initially thought >_<This DFA should match your criteria... I think
Which means there's an equivalent regex
... wow
And there's even a construction to convert a DFA into a regex
It's a DP algorithm so the resulting regex will be a fkin monster
But I'll try
okay before you do that, can I throw in another wrench and say the actual pattern (I think) is
<a?:[^\W@#:`]{2,32}:\d+>
, it's meant to match Discord emoji stringsAh this is not entirely correct, this will need more states
Are you trying to enlarge messages in your discord clone that only consists of emojis?
nope, it's for Discord's AutoMod
in short, I'm a mod in a server, and for April Fools I had an idea to get AutoMod to block any messages which contain text other than emojis (or whitespace)
although hmmmmm it'd probably be easier to just write a bot to do this lmao
Representing [^\W@#:`]{2,32} in a DFA is an absolute monster lol
Does it have to be a regex, no "algorithm" allowed?
well AutoMod only allows plain strings or regexes, but if I write a bot I keep online for the entire duration of the day then it could be an algorithm
I mean this might be a little naive of me, so excuse me if i got something wrong, but... Can't you just do this?
<a?:\w+:\d+>.*\w.*<a?:\w+:\d+>
also why didn't I ask this in #discord-dev
Ok, my problem is that the contents of
<...>
not being just 5 digits complicates it a lot, and I mean a lot for a DFA constructionSo this would be the DFA for your og. problem fixed
Ah that fkin circle, I hate these tablet things
So that top row essentially consumes your
<...>
s
The problem with that is any construct {2, 32}
is essentially a 32 times multiplier for the number of states required to represent the repeated construct
I'm not saying there's no simpler approach, because the DP conversion is hella inefficient with tons of redundancy
I'll look up if ruby regexes could do this with negative matches or somethingState diagrams | Mermaid
Create diagrams and visualizations using text and code.
Online FlowChart & Diagrams Editor - Mermaid Live Editor
Simplify documentation and avoid heavy tools. Open source Visio Alternative. Commonly used for explaining your code! Mermaid is a simple markdown-like script language for generating charts from text via javascript.
Try this @thinker227
^(?! *(<\d{5}> *)*$).*$
For your orig problem
If it works, replace the inner pattern
It's dirty negative lookahead that I haven't used in agesyep, that works
oh wow, yeah it works with the other pattern as well
Really simple trick, essentially
?!
means that don't match if the following matches
But it doesn't consume any of the input, it's a lookahead patternDiscord uses the Rust flavor of Regex, which doesn't support
What the f is the Rust flavor
I didn't even know Rust had its own flavor
idk, this site is linked from the Discord support page for AutoMod
https://rustexp.lpil.uk/
Rustexp
A Rust regular expression editor & tester.
pasting the pattern
^(?! *(<a?:[^\W@#:`]{2,32}:\d+> *)*$).*$
into it gives a syntax error
Ah I see, these regexes only support the truly regular subset
Meaning that underneath they compile the regexes to a state machine, which a few constructs don't support
This doesn't mean that there's no regex for your problem tho, but it needs to be handrolled
(Back to the DFA lol)
So the inner pattern is <a?:[^\W@#:`]{2,32}:\d+>
I'll start the construction with a more limited number, definitely not 32
And see if there's some recurring pattern in the results we can simplify or something
Ah I also need to find my old textbook on the construction, so this will likely be a late afternoon thing
(in practice you rarely construct regexes from DFAs)