sjbs
sjbs
CC#
Created by sjbs on 7/19/2023 in #help
❔ Trying to speed up CSV import times
Hi, I was wondering if anyone could give me pointers on how I can speed importing a CSV. Right now I am using CSVReader.
public IDictionary<string, IList<Array>> Import()
{
using var streamReader = new StreamReader(Filepath);

var Config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = ",",
IgnoreBlankLines = false,
TrimOptions = TrimOptions.Trim,
HasHeaderRecord = checkHeader(streamReader) // bool that just checks if the first line is a header
Mode = CsvMode.NoEscape,
};

using var dataReader = new dataReader(streamReader, Config);
var getLines = dataReader.GetRecords<DataObj>(); // all the data is kept as a string, exists for easier assigning/indexing of the columns

int lineCounter = 0;
foreach (var line in getLines)
{
lineCount++;

var goodData = CheckData.Validate(line, lineCount); // bunch of if statements just ensuring the data fits within parameters, returns the good data in an array or null, otherwise

if (validatedClaim != null)
{
AddDataToDict(goodData.Factor1, goodDataArray!); // Adds good data to a dictionary, each value is a list, where the arrays returned from check data are appended
}
}

return _dataDict;
}
public IDictionary<string, IList<Array>> Import()
{
using var streamReader = new StreamReader(Filepath);

var Config = new CsvConfiguration(CultureInfo.InvariantCulture)
{
Delimiter = ",",
IgnoreBlankLines = false,
TrimOptions = TrimOptions.Trim,
HasHeaderRecord = checkHeader(streamReader) // bool that just checks if the first line is a header
Mode = CsvMode.NoEscape,
};

using var dataReader = new dataReader(streamReader, Config);
var getLines = dataReader.GetRecords<DataObj>(); // all the data is kept as a string, exists for easier assigning/indexing of the columns

int lineCounter = 0;
foreach (var line in getLines)
{
lineCount++;

var goodData = CheckData.Validate(line, lineCount); // bunch of if statements just ensuring the data fits within parameters, returns the good data in an array or null, otherwise

if (validatedClaim != null)
{
AddDataToDict(goodData.Factor1, goodDataArray!); // Adds good data to a dictionary, each value is a list, where the arrays returned from check data are appended
}
}

return _dataDict;
}
Sorry if it's a little vague, it's a research project so I can't share the code fully. I'm just looking for general tips on how to speed up the process. I've tried using a parallel foreach loop, but the issue I run into is that I need to keep track of the lines as I write any data that is bad to a file which contains the line number of where the bad data is located. Bad data is extremely rare, and writing that file doesn't seem to be a bottleneck. Around 80% of the compute time is spent in this method with calculations & data outputting taking up the rest.
22 replies