Week 36 — What is a regex and how can it be used in Java?

Question of the Week #36
What is a regex and how can it be used in Java?
3 Replies
dan1st
dan1st16mo ago
accepted answer by saturn5vfive. (1111137361359818803):
A regex, also known is a regular expression is a string of text, also called a pattern, that is used to match parts of a larger piece of text easily and efficiently. A regex makes use of special control characters to define parser rules to outline a specific, yet infinitely variable number of patterns that could be selected. In java, a regex can be created using a Pattern object from the java.util.regex package as follows
Pattern re = Pattern.compile("import .*;$")
Pattern re = Pattern.compile("import .*;$")
the pattern above would probably match java import statements we can then check if a given string matches the pattern by using code similar to
re.matcher(INPUT).matches()
re.matcher(INPUT).matches()
this function returns a boolean which states if the given INPUT matches the regex pattern defined in "re" best answer by 0x150 (1067125865785344031): RegEx ("Regular Expressions") is a small sub-language inside of most languages, allowing the programmer to easily parse a text format, and extract information from it. It can be used to (partially) validate and parse E-Mails, for example. A regex in general looks something like this: i am (\d{1,2}) years old\.. This regex will match the string "i am [any 2 digit number] years old.", and extract the age into its own group. In java, the regex and age extraction would look like this:
Pattern regex = Pattern.compile("i am (\\d{1,2}) years old\\.", Pattern.CASE_INSENSITIVE);
String input = "I am 19 years old.";
Matcher matcher = regex.matcher(input);
if (!matcher.find()) System.out.println("Invalid text format");
else System.out.printf("You are %s years old", matcher.group(1) /* the first group is always the entire matched text, custom groups start at 1 */);
Pattern regex = Pattern.compile("i am (\\d{1,2}) years old\\.", Pattern.CASE_INSENSITIVE);
String input = "I am 19 years old.";
Matcher matcher = regex.matcher(input);
if (!matcher.find()) System.out.println("Invalid text format");
else System.out.printf("You are %s years old", matcher.group(1) /* the first group is always the entire matched text, custom groups start at 1 */);
dan1st
dan1st16mo ago
The Matcher seen here is the class being responsible for matching a Pattern (compiled regex expression) to a given String, here it's being used to find() a match, then to extract the first custom group from the found match. matches() can also be used to check if the given regex expression matches the entire string, but if you just want to use regex to extract information from a string, find() would probably work better for that. You can also call find() multiple times to search the entire string for multiple occurances of the pattern, like so:
Pattern regex = Pattern.compile("i am (\\d{1,2}) years old\\.", Pattern.CASE_INSENSITIVE);
String input = "I am 19 years old. I am 50 years old.i am 12 years old. jfskdfjsdkljfls I am 99 years old.";
Matcher matcher = regex.matcher(input);
int count = 0;
while (matcher.find()) {
System.out.printf("Person %d is %s years old.%n", ++count, matcher.group(1));
}
Pattern regex = Pattern.compile("i am (\\d{1,2}) years old\\.", Pattern.CASE_INSENSITIVE);
String input = "I am 19 years old. I am 50 years old.i am 12 years old. jfskdfjsdkljfls I am 99 years old.";
Matcher matcher = regex.matcher(input);
int count = 0;
while (matcher.find()) {
System.out.printf("Person %d is %s years old.%n", ++count, matcher.group(1));
}
That code will print this:
Person 1 is 19 years old.
Person 2 is 50 years old.
Person 3 is 12 years old.
Person 4 is 99 years old.
Person 1 is 19 years old.
Person 2 is 50 years old.
Person 3 is 12 years old.
Person 4 is 99 years old.
You may have noticed that the string can be slightly malformed, the regex will just skip invalid characters and go to the next found match, if there is one. This is especially useful if you want to use regex to find a certain pattern in a string, like in this example. Knowing this, you could use regex to extract the href attribute from an <a> html tag: <a.*?href *= *"(.*?)".*?>
No description
dan1st
dan1st16mo ago
accepted answer by theoneandonlylark (139385988672782336) : Regex means regular expression and can be used in Java to sort or search strings best answer by dan1st (358291050957111296):
Regexes (short for regular expressions) are a powerful way to find and match patterns in strings. One can specify what a pattern can look like and and then search for the pattern in a specific string or match the whole string against the pattern. For example, the regex [A-Za-z]+ [A-Za-z]+ matches all strings consisting of any number of letters (A to Z, both lower- and uppercase) as long as there is at least one such letter followed by a space and another such sequence of letters.. Java allows to match a String against a regex using the String#matches method:
System.out.println("Hello World".matches("[A-Za-z]+ [A-Za-z]+"));//true
System.out.println("Hi there".matches("[A-Za-z]+ [A-Za-z]+"));//true
System.out.println("Hello_World".matches("[A-Za-z]+ [A-Za-z]+"));//false
System.out.println("Hello World".matches("[A-Za-z]+ [A-Za-z]+"));//true
System.out.println("Hi there".matches("[A-Za-z]+ [A-Za-z]+"));//true
System.out.println("Hello_World".matches("[A-Za-z]+ [A-Za-z]+"));//false
Java also allows finding patterns inside Strings with a regex. For example, it would be possible to use the regex <@[0-9]+> for detecting Discord (user) mentions as they start with <@ followed by the user ID and >. It is possible to then extract the user ID by introducing a group for it by surrounding the part matching the user ID with parenthesis: <@([0-9]+)>
Pattern pattern = Pattern.compile("<@([0-9]+)>");//the pattern to match

String text = "Hi, I am @dan1st | Daniel and @JavaBot is a bot here";//text containing two Discord user mentions

Matcher matcher = pattern.matcher(text);//create a Matcher for searching through the String
while(matcher.find()){//as long as there are patterns left
String mention = matcher.group();//get the whole matched text
String userId = matcher.group(1);//get the group specified within () in the regex
System.out.println("found '"+mention+"' mentioning user with ID "+userId);
}
Pattern pattern = Pattern.compile("<@([0-9]+)>");//the pattern to match

String text = "Hi, I am @dan1st | Daniel and @JavaBot is a bot here";//text containing two Discord user mentions

Matcher matcher = pattern.matcher(text);//create a Matcher for searching through the String
while(matcher.find()){//as long as there are patterns left
String mention = matcher.group();//get the whole matched text
String userId = matcher.group(1);//get the group specified within () in the regex
System.out.println("found '"+mention+"' mentioning user with ID "+userId);
}
This code yields the following output:
found '@dan1st | Daniel' mentioning user with ID 358291050957111296
found '@JavaBot' mentioning user with ID 743072402702860358
found '@dan1st | Daniel' mentioning user with ID 358291050957111296
found '@JavaBot' mentioning user with ID 743072402702860358
Regexes are based on finite automata. As such, regexes are stateless and it is typically not possible to access information about what matched previously in the regex itself.
Want results from more Discord servers?
Add your server