C
C#3w ago
yourFriend

✅ Need help with regular expression pattern

I am trying to match everything between two given tags using regular expression. For example:
string s = "We are living in a <upcase>yellow submarine</upcase>. We don't have <upcase>anything</upcase> else.";
string startTag = "<upcase>";
string endTag = "</upcase>";

// I want it to match:
// 1. <upcase>yellow submarine</upcase>
// 2. <upcase>anything</upcase>

// This pattern match nothing
string pattern = $@"(?'startTag'\s*{startTag})" + $@"(?'text'.*(?!{endTag}))" + $@"(?'endTag'{endTag})";

// This pattern match till last endTag instead for first endTag after the startTag:
// 1. <upcase>yellow submarine</upcase>. We don't have <upcase>anything</upcase>
string pattern = $@"(?'startTag'\s*{startTag})" + $@"(?'text'.*)" + $@"(?'endTag'{endTag})";

MatchCollection matches = Regex.Matches(s, pattern);
string s = "We are living in a <upcase>yellow submarine</upcase>. We don't have <upcase>anything</upcase> else.";
string startTag = "<upcase>";
string endTag = "</upcase>";

// I want it to match:
// 1. <upcase>yellow submarine</upcase>
// 2. <upcase>anything</upcase>

// This pattern match nothing
string pattern = $@"(?'startTag'\s*{startTag})" + $@"(?'text'.*(?!{endTag}))" + $@"(?'endTag'{endTag})";

// This pattern match till last endTag instead for first endTag after the startTag:
// 1. <upcase>yellow submarine</upcase>. We don't have <upcase>anything</upcase>
string pattern = $@"(?'startTag'\s*{startTag})" + $@"(?'text'.*)" + $@"(?'endTag'{endTag})";

MatchCollection matches = Regex.Matches(s, pattern);
9 Replies
MODiX
MODiX3w ago
leowest
REPL Result: Success
#nuget htmlagilitypack
using HtmlAgilityPack;

string s = "We are living in a <upcase>yellow submarine</upcase>. We don't have <upcase>anything</upcase> else.";
var doc = new HtmlDocument();
doc.LoadHtml(s);

var items = doc.DocumentNode.SelectNodes("//upcase");
foreach ( var item in items )
{
Console.WriteLine(item.InnerText);
}
#nuget htmlagilitypack
using HtmlAgilityPack;

string s = "We are living in a <upcase>yellow submarine</upcase>. We don't have <upcase>anything</upcase> else.";
var doc = new HtmlDocument();
doc.LoadHtml(s);

var items = doc.DocumentNode.SelectNodes("//upcase");
foreach ( var item in items )
{
Console.WriteLine(item.InnerText);
}
Console Output
yellow submarine
anything
yellow submarine
anything
Quoted by
<@1102729783969861782> from #bot-spam (click here)
Compile: 1124.924ms | Execution: 50.575ms | React with ❌ to remove this embed.
yourFriend
yourFriendOP3w ago
Thank you. This is so much better for parsing html type documents. I was learning regular expressions but I'll not use it if I have more readable options like this.
leowest
leowest3w ago
its good to learn regex, but sometimes u find tools that have been tested and just work nice enough for the needs
MODiX
MODiX3w ago
leowest
REPL Result: Success
string s = "We are living in a <upcase>yellow submarine</upcase>. We don't have <upcase>anything</upcase> else.";
string tag = "upcase";
string pattern = $@"<{tag}>(.*?)</{tag}>";

foreach (Match match in Regex.Matches(s, pattern, RegexOptions.IgnoreCase))
{
Console.WriteLine("Found '{0}' at position {1}", match.Groups[1], match.Index);
}
string s = "We are living in a <upcase>yellow submarine</upcase>. We don't have <upcase>anything</upcase> else.";
string tag = "upcase";
string pattern = $@"<{tag}>(.*?)</{tag}>";

foreach (Match match in Regex.Matches(s, pattern, RegexOptions.IgnoreCase))
{
Console.WriteLine("Found '{0}' at position {1}", match.Groups[1], match.Index);
}
Console Output
Found 'yellow submarine' at position 19
Found 'anything' at position 68
Found 'yellow submarine' at position 19
Found 'anything' at position 68
Compile: 396.277ms | Execution: 92.956ms | React with ❌ to remove this embed.
leowest
leowest3w ago
you can use the captured groups with regex things get complex when you have multiple levels of tags within each other and malformed html $htmlregex
MODiX
MODiX3w ago
Stack Overflow
RegEx match open tags except XHTML self-contained tags
I need to match all of these opening tags: <p> <a href="foo"> But not self-closing tags: <br /> <hr class="foo" /> I came up with this and wanted to make
yourFriend
yourFriendOP3w ago
Thanks, making named groups was increasing complexity unnecessarily Um.. neither. I was just practicing regex with html type tags. Don't know much about html or xml. Will look into it when doing something serious.
leowest
leowest3w ago
xml has its own tools to process it as well

Did you find this page helpful?