C
C#2y ago
Chris TCC

❔ regex spamfilter

I'm trying to make a spam filter using regex. It's quite the undertaking, is anyone able to give me some pointers? Currently, I'm working on a repeat word filter. These are the conditions: - word repeated more than 3 times - group of words repeated more than 3 times - return false when the repeats have other irrelevant words imbetween some examples:
something test test test test test test do it now - true (test)
something testing idk testing idk testing idk lol you idiot - true (testing idk)
hello my name is chris and idk why my name is chris but all i can say is that my name is chris - false (there are repeats but other irrelevant words in between
something test test test test test test do it now - true (test)
something testing idk testing idk testing idk lol you idiot - true (testing idk)
hello my name is chris and idk why my name is chris but all i can say is that my name is chris - false (there are repeats but other irrelevant words in between
currently I've got this (\b\w+\b)\s+\b\1\b\s\b\1\b but it considers the 2nd case as false - it can't detect multiple word groups.
9 Replies
Denis
Denis2y ago
I'm unsure that using solely regex for this task is the right approach
Chris TCC
Chris TCCOP2y ago
is there any other way?
Denis
Denis2y ago
To determine the occurrence of each word, you can split them on white spaces using RegEx. Then you'd process the list of words and determine the occurrence of them
Chris TCC
Chris TCCOP2y ago
but that doesn't help the 3rd case where words are repeated but used properly in context
Denis
Denis2y ago
You'd check if they are repeated in sequence Iterate through collection of words and check if the given word is repeated multiple times in a row If it is repeated three times, then stop the spam check and return that it is spam Checking a repeated sequence of words is a bit more complicated Not sure how to do that efficiently rn
Denis
Denis2y ago
Stack Overflow
How to find duplicate phrases in a large string
I am trying to figure out an efficient way to find duplicate phrases in a large string. The string will contain hundreds or thousands of words separated by an empty space. I've included code below ...
Denis
Denis2y ago
Maybe this helps?
Chris TCC
Chris TCCOP2y ago
hm. man this spam filter thing ain't going well
Accord
Accord2y ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.
Want results from more Discord servers?
Add your server