C#•9mo ago

Handling reading large files in C#

How do I go about reducing memory usage to handle large files in x86 C#? My current code is as follows:

                FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read);
                using (BinaryReader br = new BinaryReader(fs))
                {
                    byte[] bytes = new byte[0];
                    using (MemoryStream test = new MemoryStream())
                    {
                        fs.CopyTo(test);
                        bytes = test.ToArray();
                    }

                    byte[] searchBytes = Encoding.UTF8.GetBytes("test");
                    List<long> positions = new List<long>();

                    foreach(long pos in Extensions.SearchStringInBytes(bytes, searchBytes))
                    {
                        positions.Add(pos - 4);
                    }
                }

                FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read);
                using (BinaryReader br = new BinaryReader(fs))
                {
                    byte[] bytes = new byte[0];
                    using (MemoryStream test = new MemoryStream())
                    {
                        fs.CopyTo(test);
                        bytes = test.ToArray();
                    }

                    byte[] searchBytes = Encoding.UTF8.GetBytes("test");
                    List<long> positions = new List<long>();

                    foreach(long pos in Extensions.SearchStringInBytes(bytes, searchBytes))
                    {
                        positions.Add(pos - 4);
                    }
                }

When reading a large file (>500MB) the memory usage skyrockets to 2GB. The result is that it only works in x64 build, as x86 results in a OutOfMemoryException near 1GB memory usage. I have thought of reading the file in "chunks" but I'm not sure how. Any other suggestions aside from making the program x64 only?

36 Replies

Pixel•9mo ago

so from my understand: You get a file stream, you copy it to a memorystream (2x memory usage) and then copy it to an array (3x usage) This isn't great you should be able to read a chunk (there's a function for it, i don't dick around with streams that often but you can choose how many bytes to read) for say, 1024 bytes and then store that, scan it, and then free the memory (by overwriting the byte[] storing the chunk)

FestivalDelGelato•9mo ago

yeah, it depends how long is the thing you are searching

Pixel•9mo ago

infact i believe you can just use Read(Span<Byte>), have a byte[] of size 1024 (or whatever chunk size) then Read(byte_arr) byte_arr will have the bytes, check if it has what you are looking for, and then keep going, the only issue with this is if the text you are looking for is inbetween 2 chunks but that's an easy enough fix

JesseOP•9mo ago

@Pixel I don't think I'm understanding exactly what you mean Also you mentioned a function, but I have no idea what function

Foxtrek_64•9mo ago

I actually just solved this problem at work - let me grab my code

Foxtrek_64•9mo ago

Ah yes, so the TL;DR is you want to open the file as a MemoryMappedFile

Memory-Mapped Files - .NET

Explore memory-mapped files in .NET, which contain file contents in virtual memory, and allow applications to modify the file by writing directly to the memory.

Foxtrek_64•9mo ago

I created a few support files for that here: https://github.com/TA-RPA/Tafs.Activities/tree/main/src/Tafs.Activities.FileChunks

GitHub

Tafs.Activities/src/Tafs.Activities.FileChunks at main · TA-RPA/Taf...

A collection of activities and helpers for UiPath. - TA-RPA/Tafs.Activities

Foxtrek_64•9mo ago

Includes a Chunk, a ChunkIterator, and a ReverseChunkIterator I'll get this package published to nuget since I realized it's not there yet It will be available here in a few minutes: https://www.nuget.org/packages/Tafs.Activities.FileChunks/0.1.0 The way that you'll use this is you'll init a new chunk iterator with the path to the file and the chunk lenght, then you can iterate through using a foreach loop or LINQ Do note the chunk iterator is disposable, so do wrap it in a using block/statement

JesseOP•9mo ago

I only need a way to reduce memory usage on a ±500mb, that's all 😅 Also, your code is (A)GPL. I can't work with that unfortunately

Foxtrek_64•9mo ago

It's been published Also I don't mind relicensing. LGPL is just the default for Remora projects

JesseOP•9mo ago

Doesn't work on .NET Framework 4.7.2, also a issue

Pixel•9mo ago

why are you using such old .NET?

JesseOP•9mo ago

Backwards compatibility for other systems

Pixel•9mo ago

wym you targetting windows 2000?

JesseOP•9mo ago

If it were for me to not care about backwards compatibility, I would've long switched to x64-build already

Foxtrek_64•9mo ago

It targets netstandard2.1 so it should in theory be able to use that target. I can add an explicit net461 target if you need (that's the only one I have installed currently)

JesseOP•9mo ago

at least windows 8.1

Foxtrek_64•9mo ago

Actually let me see when memorymappedfile is available before I offer that- Available since 4.0 flat, so that should be fine

JesseOP•9mo ago

JesseOP•9mo ago

yep

Foxtrek_64•9mo ago

Does the MIT license work for you?

JesseOP•9mo ago

Very very yes

Foxtrek_64•9mo ago

Sound good. I'll push a new version here shortly with MIT + net461 support, which you should be able to use with net472

JesseOP•9mo ago

Cool

MarkPflug•9mo ago

Here is your original code annotated to explain why you are seeing so much memory usage:

C#
FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read);
// not sure why you are creating a binaryReader, as you aren't using it...
using (BinaryReader br = new BinaryReader(fs))
{

    byte[] bytes = new byte[0];
    using (MemoryStream test = new MemoryStream())
    {
        // This will copy the entire file into the memory stream.
        // MemoryStream will dynamically grow, by increasing the internal buffer by 2x
        // every time space is exhausted. This means that you'll ultimately use about 2x the memory
        // when reading the entire file.
        fs.CopyTo(test);
        // This allocates a brand new array of exactly the right size and copies
        // the bytes from memory stream's intenal buffer to the new array.
        bytes = test.ToArray();
    }

    // The rest of this code is opaque to me, but the "positions" array could grow quite large
    // if there are a lot of matches

    byte[] searchBytes = Encoding.UTF8.GetBytes("test");
    List<long> positions = new List<long>();

    foreach (long pos in Extensions.SearchStringInBytes(bytes, searchBytes))
    {
        positions.Add(pos - 4);
    }
}

C#
FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read);
// not sure why you are creating a binaryReader, as you aren't using it...
using (BinaryReader br = new BinaryReader(fs))
{

    byte[] bytes = new byte[0];
    using (MemoryStream test = new MemoryStream())
    {
        // This will copy the entire file into the memory stream.
        // MemoryStream will dynamically grow, by increasing the internal buffer by 2x
        // every time space is exhausted. This means that you'll ultimately use about 2x the memory
        // when reading the entire file.
        fs.CopyTo(test);
        // This allocates a brand new array of exactly the right size and copies
        // the bytes from memory stream's intenal buffer to the new array.
        bytes = test.ToArray();
    }

    // The rest of this code is opaque to me, but the "positions" array could grow quite large
    // if there are a lot of matches

    byte[] searchBytes = Encoding.UTF8.GetBytes("test");
    List<long> positions = new List<long>();

    foreach (long pos in Extensions.SearchStringInBytes(bytes, searchBytes))
    {
        positions.Add(pos - 4);
    }
}

Foxtrek_64•9mo ago

With MemoryMappedFile, you get to control the memory usage. By default I have it set to 2mb, but you can change that to whatever you want using the provided LengthsConstants or any long value representing the number of bytes Forgot to hit enter

MarkPflug•9mo ago

The API you probably want is:

byte[] bytes = File.ReadAllBytes(filepath);

byte[] bytes = File.ReadAllBytes(filepath);

JesseOP•9mo ago

yeah that BinaryReader is because of different code that is hidden for readability here

MarkPflug•9mo ago

I would stay away from the complexity of MemoryMappedFile, unless you expect the files to exceed 2GB. Even then, you'd probably be better off adjusting your algorithm to work in a streaming/buffered approach

JesseOP•9mo ago

also could you please add cs after the 3x ` in your message of the code Edit: thank you the file is not expected to be larger than 600mb

MarkPflug•9mo ago

How long will your "search string" typically be? Your example uses "test", is that expected to be representative?

JesseOP•9mo ago

that, works? I expected that not to work when a filestream is using a file already but surprise surprise: it does yes, 4 characters but it works good now, it can run on x86 again

MarkPflug•9mo ago

FileStream will open with FileShare.Read, so other file handles can be opened to read the file. If you try to write to it however, I'd expect that to fail while the FileStream is open.

JesseOP•9mo ago

:thumbsupsmiley: Thank you for the knowledge

Foxtrek_64•9mo ago

I went ahead and pushed those changes, but I don't really have a good test platform for the older versions of .NET, so anyone who uses this packge for netfx do so at your own risk

JesseOP•9mo ago

imo if Microsoft promises it works on Framework >4.0, it should

Gaming

Programming

Handling reading large files in C#

Did you find this page helpful?