C
C#•2y ago
malkav

Get multiple substrings from a main string

I feel like this should be basic knowledge, but whatever, imma ask anyways.. Say I have a string base insert some esab weird strings base here esab and I wanted to get every bit that is between two other strings, in this case base as "start" and esab as "end" So I would want to get every substring between start, and end. Beneath is what I expect to happen with the code.
string input = "base insert some esab weird strings base here esab";
string start = "base";
string end = "esab";

string result = [ "insert some", "here" ];
string input = "base insert some esab weird strings base here esab";
string start = "base";
string end = "esab";

string result = [ "insert some", "here" ];
I feel I should probably be using input.Substring(input.IndexOf(start), input.IndexOf(end) + end.Length) or something, but it's missing something and I can't see it.
57 Replies
Pobiega
Pobiega•2y ago
it needs a loop and some offset from the previous iteration dont forget that substring takes startIndex, length, not startIndex, stopIndex
malkav
malkav•2y ago
and how would I do that? something like:
string[] result = Array.Empty<string>();
do
{
int starting = input.IndexOf(start) + start.Length;
result[result.Length] = input.SubString(starting, input.IndexOf(end) - starting);
} while (!input.IsNullOrEmpty)
string[] result = Array.Empty<string>();
do
{
int starting = input.IndexOf(start) + start.Length;
result[result.Length] = input.SubString(starting, input.IndexOf(end) - starting);
} while (!input.IsNullOrEmpty)
(I don't think I did the loop right, but you get the gist right?)
Pobiega
Pobiega•2y ago
yeah you're really close you'll need a list instead of an array thou, because you dont know the size
malkav
malkav•2y ago
Ah fair. (this is why I did result.Length but I guess a List would be easier)
Pobiega
Pobiega•2y ago
you also need an exit condition :p
malkav
malkav•2y ago
what would be my exit condition though, because rn I don't think my input will ever reach nullOrEmpty 😅
Pobiega
Pobiega•2y ago
you are not changing your input hint: what does .IndexOf return if it doesnt find anything?
malkav
malkav•2y ago
-1 right?
Pobiega
Pobiega•2y ago
correct
malkav
malkav•2y ago
so eh, do { stuff } while (input.IndexOf(start) + 1 != 0)? wait, no... hang on 😅 I got to think with this now..
Pobiega
Pobiega•2y ago
I started with something very similar, but refactored it into a while(true)
malkav
malkav•2y ago
because my IndexOf will never return -1 because I don't modify the input at all
Pobiega
Pobiega•2y ago
thats why you need the offset
malkav
malkav•2y ago
yea I have this now:
do
{
int start = mainString.IndexOf(startString, StringComparison.Ordinal) + startString.Length;
results.Add(mainString.Substring(start, mainString.IndexOf(endString, StringComparison.Ordinal) - start));
} while (true);
do
{
int start = mainString.IndexOf(startString, StringComparison.Ordinal) + startString.Length;
results.Add(mainString.Substring(start, mainString.IndexOf(endString, StringComparison.Ordinal) - start));
} while (true);
but I am missing the offset how do I offset
Pobiega
Pobiega•2y ago
should I post my solution? or do you wanna figure it out
malkav
malkav•2y ago
I wanna figure it out, but I would like help in guiding my thoughts to the solution if you don't mind? I learn more that way than pre-made code 😅
Pobiega
Pobiega•2y ago
ok so you need an offset for your indexOf so you dont find the same match forever this offset needs to live outside the loop obviously
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Pobiega
Pobiega•2y ago
not sure how that helps, that part is already figured out
malkav
malkav•2y ago
hrm...
List<string> results = new();
int offset = 0;
do
{
int start = mainString.IndexOf(startString + offset) + startString.Length
results.Add(mainString.Substring(start, mainString.IndexOf(endString + offset) - start);
offset = offset + (prolly the Length of the startString + subString + endString??);
} while (offset != // I'm having a mind-stop here)
List<string> results = new();
int offset = 0;
do
{
int start = mainString.IndexOf(startString + offset) + startString.Length
results.Add(mainString.Substring(start, mainString.IndexOf(endString + offset) - start);
offset = offset + (prolly the Length of the startString + subString + endString??);
} while (offset != // I'm having a mind-stop here)
Pobiega
Pobiega•2y ago
😄
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
Pobiega
Pobiega•2y ago
you are SOOO close to my pre-refactor solution well lets just not introduce regexes
malkav
malkav•2y ago
I deliberately didn't choose for regex, because there are some prerequisites to this that cannot be done with Regex I know regex enough to do some things, but this is off topic 😅
Unknown User
Unknown User•2y ago
Message Not Public
Sign In & Join Server To View
malkav
malkav•2y ago
What am I missing for my offset let me re-iterate: "I cannot do that with regex" 🤣
Pobiega
Pobiega•2y ago
offset > -1 but this is where I refactored because I realized I had to set offset to a special value to signal the end and at that point, it was cleaner to use a while(true)
malkav
malkav•2y ago
yea and I need to change it constantly in the loop
Pobiega
Pobiega•2y ago
yeah but thats fine you just set it to indexof end
malkav
malkav•2y ago
but doesn't it find the same "iteration" of the end string? example: <sub>Some string here</sub> <sub>Another string here</sub> now if I set offset to mainString.IndexOf(endString) does it not set it to the very first </sub> every time?
Pobiega
Pobiega•2y ago
tbh I didnt set up unittests and try on a whole bunch of strings I just ran on your first prompt
malkav
malkav•2y ago
okay, lemme make a unit test for this one it seems I should turn the do/while around. Because it sets itself to -1 and then does another loop??
System.ArgumentOutOfRangeException : Length cannot be less than zero. (Parameter 'length')
at System.String.Substring(Int32 startIndex, Int32 length)
System.ArgumentOutOfRangeException : Length cannot be less than zero. (Parameter 'length')
at System.String.Substring(Int32 startIndex, Int32 length)
at this code:
[Test]
public void StringHasSingleSubstring()
{
// Arrange
const string testString = "<sub>Need this</sub>";
const string startString = "<sub>";
const string endString = "</sub>";

// Act:
List<string> results = new();

int offset = 0;
do
{
int start = testString.IndexOf(startString + offset, StringComparison.Ordinal) + startString.Length;
results.Add(testString.Substring(start, testString.IndexOf(endString + offset, StringComparison.Ordinal) - start));
offset = testString.IndexOf(endString, StringComparison.Ordinal);
} while (offset > -1);

// Assert
results.Count.Should().Be(1);
}
[Test]
public void StringHasSingleSubstring()
{
// Arrange
const string testString = "<sub>Need this</sub>";
const string startString = "<sub>";
const string endString = "</sub>";

// Act:
List<string> results = new();

int offset = 0;
do
{
int start = testString.IndexOf(startString + offset, StringComparison.Ordinal) + startString.Length;
results.Add(testString.Substring(start, testString.IndexOf(endString + offset, StringComparison.Ordinal) - start));
offset = testString.IndexOf(endString, StringComparison.Ordinal);
} while (offset > -1);

// Assert
results.Count.Should().Be(1);
}
Pobiega
Pobiega•2y ago
[Fact]
public void GetBetween_works_2()
{
var output = "<sub>Some string here</sub> <sub>Another string here</sub>".GetBetween("<sub>", "</sub>");

output.ShouldBeEquivalentTo(new[] { "Some string here", "Another string here" });
}
[Fact]
public void GetBetween_works_2()
{
var output = "<sub>Some string here</sub> <sub>Another string here</sub>".GetBetween("<sub>", "</sub>");

output.ShouldBeEquivalentTo(new[] { "Some string here", "Another string here" });
}
passes 🙂 GetBetween is my function (I made it an extension method)
malkav
malkav•2y ago
okay, can I see your extension method then?
Pobiega
Pobiega•2y ago
that would be my entire solution. you sure about it?
malkav
malkav•2y ago
yea, then I can see what I missed
Pobiega
Pobiega•2y ago
public static string[] GetBetween(this string input, string start, string end)
{
var offset = 0;
var results = new List<string>();

while (true)
{
var startIndex = input.IndexOf(start, offset);
var endIndex = input.IndexOf(end, offset);

if (startIndex == -1 || endIndex == -1) return results.ToArray();
results.Add(input.Substring(startIndex + start.Length, endIndex - startIndex - start.Length).Trim());
offset = endIndex + 1;
}
}
public static string[] GetBetween(this string input, string start, string end)
{
var offset = 0;
var results = new List<string>();

while (true)
{
var startIndex = input.IndexOf(start, offset);
var endIndex = input.IndexOf(end, offset);

if (startIndex == -1 || endIndex == -1) return results.ToArray();
results.Add(input.Substring(startIndex + start.Length, endIndex - startIndex - start.Length).Trim());
offset = endIndex + 1;
}
}
malkav
malkav•2y ago
Ah, you did while(true) where I did do{}while() 😅
Pobiega
Pobiega•2y ago
yup, and the offset needs to be added to the correct place 😛
malkav
malkav•2y ago
yea I didn't
Pobiega
Pobiega•2y ago
malkav
malkav•2y ago
Right, catfacepalm I'm dumbdumb 🤣 but I see what I did wrong now Wait, why did you set the offset to endIndex + 1?
Pobiega
Pobiega•2y ago
because I did 😄 otherwise it errors out on length because the index will be greater than the string length, if the end is the very last thing
malkav
malkav•2y ago
Guess that works
malkav
malkav•2y ago
// Each of my tests look like this:
public void StringhasSingleSubstring()
{
// Arrange: Here I set startstring to either of the following two values:
const string testString = "<sub>Need this</sub>";
// const string testString = "<sub>Need this</sub> <sub>And this</sub> <sub>and maybe this too?</sub>";
const string startString = "<sub>"; // on the "IsMissing" functions, I set this to <subs> or the endstring to </subs>
const string endString = "</sub>";
// Act:
string[] results = testString.GetBetween(startString,endString);

// Assert:
results.Should().Be(value);
// Value is 1 for StringHasSingleSubstring(), 3 for StringHasMultipleSubstrings(), and 0 for both EndStringIsMissing_ReturnsEmptyArray() and StartStringIsMissing_ReturnsEmptyArray()
}
// Each of my tests look like this:
public void StringhasSingleSubstring()
{
// Arrange: Here I set startstring to either of the following two values:
const string testString = "<sub>Need this</sub>";
// const string testString = "<sub>Need this</sub> <sub>And this</sub> <sub>and maybe this too?</sub>";
const string startString = "<sub>"; // on the "IsMissing" functions, I set this to <subs> or the endstring to </subs>
const string endString = "</sub>";
// Act:
string[] results = testString.GetBetween(startString,endString);

// Assert:
results.Should().Be(value);
// Value is 1 for StringHasSingleSubstring(), 3 for StringHasMultipleSubstrings(), and 0 for both EndStringIsMissing_ReturnsEmptyArray() and StartStringIsMissing_ReturnsEmptyArray()
}
Pobiega
Pobiega•2y ago
[Fact]
public void GetBetween_works_1()
{
var output = "base insert some esab weird strings base here esab".GetBetween("base", "esab");

output.ShouldBeEquivalentTo(new[] { "insert some", "here" });
}

[Fact]
public void GetBetween_works_2()
{
var output = "<sub>Some string here</sub> <sub>Another string here</sub>".GetBetween("<sub>", "</sub>");

output.ShouldBeEquivalentTo(new[] { "Some string here", "Another string here" });
}
[Fact]
public void GetBetween_works_1()
{
var output = "base insert some esab weird strings base here esab".GetBetween("base", "esab");

output.ShouldBeEquivalentTo(new[] { "insert some", "here" });
}

[Fact]
public void GetBetween_works_2()
{
var output = "<sub>Some string here</sub> <sub>Another string here</sub>".GetBetween("<sub>", "</sub>");

output.ShouldBeEquivalentTo(new[] { "Some string here", "Another string here" });
}
these are the tests I used 🙂 using Shouldly for the asserts
malkav
malkav•2y ago
I'm using FluentAssertions 😅 but then again, I use nUnit instead of xUnit
Pobiega
Pobiega•2y ago
yeah I love that one its what I use for more serious projects
malkav
malkav•2y ago
Heck yes. Plus I get smacks from the senior here if I don't test for failed cases too 🤣 hence the "Is Missing" tests
Pobiega
Pobiega•2y ago
sure
malkav
malkav•2y ago
I can't however test my Azure Function (or don't know how) the same way
Pobiega
Pobiega•2y ago
lets just say, I didn't bother writing production quality tests for this 😄
malkav
malkav•2y ago
hahaha no I recon, though I might have to, so I'll spend the next 45 minutes of the Hackaton here to work out how to Unit Test on Azure Functions thanks for your help man! much appreciated
Pobiega
Pobiega•2y ago
hm? you just test the actual functionality locally oh you code directly ON the azure function? eugh
malkav
malkav•2y ago
No I do it in Rider.
Pobiega
Pobiega•2y ago
riiight then just run the tests locally on rider too?
malkav
malkav•2y ago
for some reason, I can't even run the azure function in the debugger, and if I call the Azure Function from my nUnit Tests, then I need to pass the parameters that Azure Functions require, which I cannot instantiate from nowhere...
public static async Task<IActionResult> RunAsync(
[HttpTrigger(AuthorizationLevel.Function, "post", Route = null)]
HttpRequest req, ILogger log)
{
public static async Task<IActionResult> RunAsync(
[HttpTrigger(AuthorizationLevel.Function, "post", Route = null)]
HttpRequest req, ILogger log)
{
those parameters anyways, brb. I finally have 5 minutes for a smoke lol