C
C#3mo ago
Pan!cKk

Download File in Chunks

Is there a way to download a file in chunks (with a specified bufferSize), but not download them to local disk, but only read that chunk in memory, and then upload it to S3 storage using multiPartUpload?
12 Replies
Pan!cKk
Pan!cKk3mo ago
FIle is to be downloaded from a given URL
Jimmacle
Jimmacle3mo ago
if you use streams you can read as much or as little of it as you want at a time
Pan!cKk
Pan!cKk3mo ago
I tried using Streams, but I get such an error: The response ended prematurely, with at least 9823402 additional bytes expected.
Jimmacle
Jimmacle3mo ago
need to see code
Pan!cKk
Pan!cKk3mo ago
It was something like this:
private async Task<byte[]> DownloadChunk(HttpResponseMessage response, long start, long end)
{
using(var memoryStream = new MemoryStream())
using(var responseStream = await response.Content.ReadAsStreamAsync())
{
var bufferSize = 10 * 1024 * 1024; // 10MB
var buffer = new byte[bufferSize];
int bytesRead;
int totalBytesRead = 0;

// Seek to the start position
responseStream.Seek(start, SeekOrigin.Begin);

while(totalBytesRead < end - start && (bytesRead = await responseStream.ReadAsync(buffer, 0, bufferSize)) > 0)
{
await memoryStream.WriteAsync(buffer, 0, bytesRead);
totalBytesRead += bytesRead;
}

return memoryStream.ToArray();
}
}
private async Task<byte[]> DownloadChunk(HttpResponseMessage response, long start, long end)
{
using(var memoryStream = new MemoryStream())
using(var responseStream = await response.Content.ReadAsStreamAsync())
{
var bufferSize = 10 * 1024 * 1024; // 10MB
var buffer = new byte[bufferSize];
int bytesRead;
int totalBytesRead = 0;

// Seek to the start position
responseStream.Seek(start, SeekOrigin.Begin);

while(totalBytesRead < end - start && (bytesRead = await responseStream.ReadAsync(buffer, 0, bufferSize)) > 0)
{
await memoryStream.WriteAsync(buffer, 0, bytesRead);
totalBytesRead += bytesRead;
}

return memoryStream.ToArray();
}
}
I am trying different methods, so the current state of my code has changed... is there any article or stack overflow thread where I can rely? or could you send me some example? @Jimmacle
mtreit
mtreit3mo ago
Don't use async methods on MemoryStream for one thing
Jimmacle
Jimmacle3mo ago
"the response ended prematurely" sounds like the other side of the connection closed for some reason this particular method will dispose the response stream after reading a single chunk, are you sure you want that?
Pan!cKk
Pan!cKk3mo ago
it does not happen when try to read the whole file 🤷‍♂️ What I want is to implement something that download a file part by part, and upload them to storage part by part... Download Part 1, Upload Part 1 Continue Downloading the second part, Upload Part 2 ... How many parts are decided base on the total fileSize / some chunkSize (in bytes)
Jimmacle
Jimmacle3mo ago
my point is i'm not sure it's valid to keep calling .ReadAsStreamAsync() and disposing the stream on the same HttpResponseMessage you'd want to open the stream once and do the chunking within there
Pan!cKk
Pan!cKk3mo ago
I can download it chunk by chunk and do a multi part upload, within a ReadAsStream...?
mtreit
mtreit3mo ago
It feels like you don't want to be creating and disposing the stream inside this method, but rather pass in the stream on each call.
using var responseStream = await response.Content.ReadAsStreamAsync();
while (true)
{
var chunk = DownloadChunk(responseStream, size);
if (chunk.Length == 0)
{
break;
}
UploadChunk(chunk);
}
using var responseStream = await response.Content.ReadAsStreamAsync();
while (true)
{
var chunk = DownloadChunk(responseStream, size);
if (chunk.Length == 0)
{
break;
}
UploadChunk(chunk);
}
The stream has an internal pointer that advances as you read it so you don't need to seek or anything like that. Although even having a separate method here is maybe overkill.
Pan!cKk
Pan!cKk3mo ago
I will take a shot at this approach, thank you :)