C
C#2y ago
WaffleDevs

❔ Whats the fastest way to write this data to a file? (C# Console App.)

I need to write an int[][] to disk. Each int[] has 2 values, and the full int[][] has 100mil entries. What would be the fastest method of writing this to disk?
35 Replies
Angius
Angius2y ago
StreamWriter for sure, this amount of data should be streamed Perhaps there are some better, lower-level ways, but I'm unaware of them
Jimmacle
Jimmacle2y ago
if each int[] has 2 values can it be a tuple instead? then you could at least get it all in one contiguous block of memory, though IO will probably be the bottleneck regardless
WaffleDevs
WaffleDevsOP2y ago
Shall i just combine them like such? sw.WriteLine(xY[0] + ", " + xY[1]);
Jimmacle
Jimmacle2y ago
if you want this to be fast you shouldn't even be touching strings
WaffleDevs
WaffleDevsOP2y ago
I thought so from testing 🤣, i just dont know any other way to combine them with a seperator.
Jimmacle
Jimmacle2y ago
the format is simple and consistent so you could just dump the bytes out into a file directly well, you didn't specify any output format requirements so that will affect what solutions you can use
WaffleDevs
WaffleDevsOP2y ago
Ah my bad 😅. I would prefer that each int[] is written on seperate lines, and each value is seperated with at least 1 non number character. Ex:
1,2
3,4
5,6
1,2
3,4
5,6
Jimmacle
Jimmacle2y ago
then just get rid of the string concatenation and write each part of the line individually with streamwriter calls
WaffleDevs
WaffleDevsOP2y ago
so
using (StreamWriter sw = new StreamWriter("CDriveDirs.txt"))
{
foreach (int[] xY in worlds)
{
sw.WriteLine(xY);
}
}
using (StreamWriter sw = new StreamWriter("CDriveDirs.txt"))
{
foreach (int[] xY in worlds)
{
sw.WriteLine(xY);
}
}
?
Jimmacle
Jimmacle2y ago
like
f.Write(data[i][0]);
f.Write(',');
f.WriteLine(data[i][1]);
f.Write(data[i][0]);
f.Write(',');
f.WriteLine(data[i][1]);
obviously test to see which is actually faster, but this should avoid some string allocations
using System.Diagnostics;

int[][] data = Enumerable.Range(0, 100000000).Select(_ => new[] { Random.Shared.Next(), Random.Shared.Next() }).ToArray();

Console.WriteLine("start");
var sw = Stopwatch.StartNew();
using var f = File.CreateText("output.txt");
for (var i = 0; i < data.Length; i++)
{
f.Write(data[i][0]);
f.Write(',');
f.WriteLine(data[i][1]);
}
Console.WriteLine(sw.Elapsed);
using System.Diagnostics;

int[][] data = Enumerable.Range(0, 100000000).Select(_ => new[] { Random.Shared.Next(), Random.Shared.Next() }).ToArray();

Console.WriteLine("start");
var sw = Stopwatch.StartNew();
using var f = File.CreateText("output.txt");
for (var i = 0; i < data.Length; i++)
{
f.Write(data[i][0]);
f.Write(',');
f.WriteLine(data[i][1]);
}
Console.WriteLine(sw.Elapsed);
gets me
start
00:00:10.2480180
start
00:00:10.2480180
f.WriteLine($"{data[i][0]},{data[i][1]}"); gets 00:00:12.6447007 (these are very unscientific benchmarks)
Angius
Angius2y ago
Yeah, run it through Benchmark.NET to get info about allocations and all that jazz
Jimmacle
Jimmacle2y ago
realistically disk write speed is gonna be more of a bottleneck than any micro-optimization here
WaffleDevs
WaffleDevsOP2y ago
Ah it worked in <1m 🙂 thank you!
WaffleDevs
WaffleDevsOP2y ago
WaffleDevs
WaffleDevsOP2y ago
Hahaha
WaffleDevs
WaffleDevsOP2y ago
Oh
Jimmacle
Jimmacle2y ago
those must be small ints, my output file with random numbers is about 2GB
WaffleDevs
WaffleDevsOP2y ago
They are x/y coords. min is -8000 for both, max is +8000
Jimmacle
Jimmacle2y ago
i wonder if parallelization would make it faster PepeHmmm SSD should be able to do 2.5GB/s write can you even seek and write to the same file in parallel? i guess it wouldn't even apply here since the line length is variable
Angius
Angius2y ago
https://github.com/RolandPheasant/TailBlazer this is made for browsing logs, but also great for opening any text file that's too big for anything else. Can open tens of gigs of .txt files no problem
WaffleDevs
WaffleDevsOP2y ago
Ah ty, i was just using VSC and it crashed 🤣
Angius
Angius2y ago
I don't think you can write to the same file from multiple threads
Jimmacle
Jimmacle2y ago
2 seconds when
using var f = File.Create("output.bin");
for (var i = 0; i < data.Length; i++)
{
unsafe
{
fixed (int* ptr = data[i])
{
f.Write(new ReadOnlySpan<byte>(ptr, 8));
}
}
}
using var f = File.Create("output.bin");
for (var i = 0; i < data.Length; i++)
{
unsafe
{
fixed (int* ptr = data[i])
{
f.Write(new ReadOnlySpan<byte>(ptr, 8));
}
}
}
abandon text, embrace binary if you used an array of tuples you could just dump the whole block of memory into a file assuming you don't have any concerns about endianness which knocks it down to 0.5 seconds on my machine and pretty closely matches the theoretical max write speed of my SSD
(int, int)[] data = Enumerable.Range(0, 100000000).Select(_ => (Random.Shared.Next(), Random.Shared.Next())).ToArray();

Console.WriteLine("start");
var sw = Stopwatch.StartNew();
using var f = File.Create("output.bin");
unsafe
{
fixed ((int, int)* ptr = data)
{
f.Write(new ReadOnlySpan<byte>(ptr, data.Length * 8));
}
}
Console.WriteLine(sw.Elapsed);
(int, int)[] data = Enumerable.Range(0, 100000000).Select(_ => (Random.Shared.Next(), Random.Shared.Next())).ToArray();

Console.WriteLine("start");
var sw = Stopwatch.StartNew();
using var f = File.Create("output.bin");
unsafe
{
fixed ((int, int)* ptr = data)
{
f.Write(new ReadOnlySpan<byte>(ptr, data.Length * 8));
}
}
Console.WriteLine(sw.Elapsed);
so the question is, is having it in text format (and a jagged array) worth it taking 5-20 times as long?
mtreit
mtreit2y ago
Sure you can However you need to be very careful to not corrupt the hell out of the file
Aaron
Aaron2y ago
stop this immediately
Jimmacle
Jimmacle2y ago
no gotta go fast actually i made a typo there didn't i
Aaron
Aaron2y ago
using var f = File.Create("output.bin");
for (var i = 0; i < data.Length; i++)
{
f.Write(MemoryMarshal.AsBytes(data[i].AsSpan());
}
using var f = File.Create("output.bin");
for (var i = 0; i < data.Length; i++)
{
f.Write(MemoryMarshal.AsBytes(data[i].AsSpan());
}
at least make it safe
Jimmacle
Jimmacle2y ago
i don't need no new fangled fancy memory apis
mtreit
mtreit2y ago
lol wtf is this
Aaron
Aaron2y ago
the same thing jimmacle did in that first code block without using fixed/unsafe
mtreit
mtreit2y ago
Hmm
Joreyk ( IXLLEGACYIXL )
i dont want to say that you should throw a database at it ... but you could throw a database at it 👀 like good luck loading the file or altering data in it
DaVinki
DaVinki2y ago
Auto flush to false, make buffer large enough, ???, flush, profit
Joreyk ( IXLLEGACYIXL )
batch to thousand queries and pump that in bulkinsert
Accord
Accord2y ago
Was this issue resolved? If so, run /close - otherwise I will mark this as stale and this post will be archived until there is new activity.

Did you find this page helpful?