C
C#3y ago
HimmDawg

HttpClient.GetStringAsync() is slow

Hey owo I wanted to download some gifs from a wiki page via a console app. Everything works and all, but when it comes to downloading the images, it's kinda slow. So here's the crucial part
for (int i = 0; i < names.Count(); i++)
{
Console.Write($"Download gif ({progress} / {names.Count()})");

string name = names[i];

var pageSourceCode = await client.GetStringAsync($"{baseURL}/{name}_idle_animation.gif");
Regex gifFinderRegex = new Regex(@"<a href=""(/images/[a-zA-Z0-9]/[a-zA-Z0-9]{2}/[a-zA-Z0-9]+_idle_animation\.gif)"">");

Match match = gifFinderRegex.Match(pageSourceCode);

byte[] fileBytes = await client.GetByteArrayAsync($"{baseURL}{match.Groups[1]}");
File.WriteAllBytes(Path.Combine(saveLocation, name + ".gif"), fileBytes);

progress++;
Console.Write("\r");
}
for (int i = 0; i < names.Count(); i++)
{
Console.Write($"Download gif ({progress} / {names.Count()})");

string name = names[i];

var pageSourceCode = await client.GetStringAsync($"{baseURL}/{name}_idle_animation.gif");
Regex gifFinderRegex = new Regex(@"<a href=""(/images/[a-zA-Z0-9]/[a-zA-Z0-9]{2}/[a-zA-Z0-9]+_idle_animation\.gif)"">");

Match match = gifFinderRegex.Match(pageSourceCode);

byte[] fileBytes = await client.GetByteArrayAsync($"{baseURL}{match.Groups[1]}");
File.WriteAllBytes(Path.Combine(saveLocation, name + ".gif"), fileBytes);

progress++;
Console.Write("\r");
}
Granted, I'm doing some really slow things here like two requests, I/O and Regex on a huge string, but assuming the server is 100% responsive, it shouldn't take 10s for one iteration. Not sure what's happening in the background foxThinking
9 Replies
Angius
Angius3y ago
Why are you downloading a gif as a string..? .GetByteArrayAsync()
HimmDawg
HimmDawgOP3y ago
No, I'm getting the source code of the page first because the path doesn't have a predictable pattern so I'm extracting it from the source code
Angius
Angius3y ago
Ah, yeah, I see that I'd use the profiler, or at least some logging with time, to figure out what takes the most time
HimmDawg
HimmDawgOP3y ago
Something I noticed while writing the post is that I can replace File.WriteAllBytes with the async version of it.
Angius
Angius3y ago
My bet is on the regex Might be worth a try
HimmDawg
HimmDawgOP3y ago
The regex is actually one of the faster instructions (2ms). The two main culprits are the two requests with about 450ms Maybe the assumption is wrong that the server is responsive all the time because i cannot find any major flaw in my code
Angius
Angius3y ago
It could be rate-limiting you How long does the request take from the browser? From Postman or something? Could you try spoofing the user-agent to say it's Chrome or Firefox?
becquerel
becquerel3y ago
have you tried using Task.WhenAll to parallelise this?
n8ta
n8ta3y ago
@HimmDawg load the page in your browser and open up the devools. That can show you how long getting the html part of the pages takes. If it's about the same as in your code then there's nothing to be done.

Did you find this page helpful?