Handling reading large files in C#
How do I go about reducing memory usage to handle large files in x86 C#? My current code is as follows:
When reading a large file (>500MB) the memory usage skyrockets to 2GB.
The result is that it only works in x64 build, as x86 results in a OutOfMemoryException near 1GB memory usage.
I have thought of reading the file in "chunks" but I'm not sure how. Any other suggestions aside from making the program x64 only?
36 Replies
so from my understand:
You get a file stream, you copy it to a memorystream (2x memory usage) and then copy it to an array (3x usage)
This isn't great
you should be able to read a chunk (there's a function for it, i don't dick around with streams that often but you can choose how many bytes to read) for say, 1024 bytes and then store that, scan it, and then free the memory (by overwriting the byte[] storing the chunk)
yeah, it depends how long is the thing you are searching
infact i believe you can just use
Read(Span<Byte>)
,
have a byte[] of size 1024 (or whatever chunk size)
then Read(byte_arr)
byte_arr will have the bytes, check if it has what you are looking for, and then keep going, the only issue with this is if the text you are looking for is inbetween 2 chunks but that's an easy enough fix@Pixel I don't think I'm understanding exactly what you mean
Also you mentioned a function, but I have no idea what function
I actually just solved this problem at work - let me grab my code
Ah yes, so the TL;DR is you want to open the file as a MemoryMappedFile
Memory-Mapped Files - .NET
Explore memory-mapped files in .NET, which contain file contents in virtual memory, and allow applications to modify the file by writing directly to the memory.
I created a few support files for that here: https://github.com/TA-RPA/Tafs.Activities/tree/main/src/Tafs.Activities.FileChunks
GitHub
Tafs.Activities/src/Tafs.Activities.FileChunks at main · TA-RPA/Taf...
A collection of activities and helpers for UiPath. - TA-RPA/Tafs.Activities
Includes a
Chunk
, a ChunkIterator
, and a ReverseChunkIterator
I'll get this package published to nuget since I realized it's not there yet
It will be available here in a few minutes: https://www.nuget.org/packages/Tafs.Activities.FileChunks/0.1.0
The way that you'll use this is you'll init a new chunk iterator with the path to the file and the chunk lenght, then you can iterate through using a foreach loop or LINQ
Do note the chunk iterator is disposable, so do wrap it in a using block/statementI only need a way to reduce memory usage on a ±500mb, that's all 😅
Also, your code is (A)GPL. I can't work with that unfortunately
It's been published
Also I don't mind relicensing. LGPL is just the default for Remora projects
Doesn't work on .NET Framework 4.7.2, also a issue
why are you using such old .NET?
Backwards compatibility for other systems
wym
you targetting windows 2000?
If it were for me to not care about backwards compatibility, I would've long switched to x64-build already
It targets netstandard2.1 so it should in theory be able to use that target. I can add an explicit net461 target if you need (that's the only one I have installed currently)
at least windows 8.1
Actually let me see when memorymappedfile is available before I offer that-
Available since 4.0 flat, so that should be fine
yep
Does the MIT license work for you?
Very very yes
Sound good. I'll push a new version here shortly with MIT + net461 support, which you should be able to use with net472
Cool
Here is your original code annotated to explain why you are seeing so much memory usage:
With MemoryMappedFile, you get to control the memory usage. By default I have it set to 2mb, but you can change that to whatever you want using the provided
LengthsConstants
or any long
value representing the number of bytes
Forgot to hit enterThe API you probably want is:
yeah that BinaryReader is because of different code that is hidden for readability here
I would stay away from the complexity of MemoryMappedFile, unless you expect the files to exceed 2GB. Even then, you'd probably be better off adjusting your algorithm to work in a streaming/buffered approach
also could you please add cs after the 3x ` in your message of the code
Edit: thank you
the file is not expected to be larger than 600mb
How long will your "search string" typically be? Your example uses "test", is that expected to be representative?
that, works?
I expected that not to work when a filestream is using a file already but surprise surprise: it does
yes, 4 characters
but it works good now, it can run on x86 again
FileStream will open with FileShare.Read, so other file handles can be opened to read the file. If you try to write to it however, I'd expect that to fail while the FileStream is open.
:thumbsupsmiley:
Thank you for the knowledge
I went ahead and pushed those changes, but I don't really have a good test platform for the older versions of .NET, so anyone who uses this packge for netfx do so at your own risk
imo if Microsoft promises it works on Framework >4.0, it should