I have the following function
which worked well in the past, but now the company who is sending us the CSV has added a new field which sometimes has newlines (\r\n). This function will no longer support us because (line = readFile.ReadLine()) splits at each newline.
What is the best way to modify the existing function to only split at newlines that aren't enclosed in double quotes? I suppose I could create a StreamReader extension and call it ReadEntry and basically recreate what ReadLine already does... but that sounds rather tedious and out of my skill level, to be honest.
Code:
/// <summary>
/// Pulls info from CSV file and stores each entry as list of string arrays
/// </summary>
/// <param name="path"></param>
/// <returns></returns>
public static List<string[]> parseCSV(string path)
{
//
List<string[]> parsedData = new List<string[]>();
try
{
using (StreamReader readFile = new StreamReader(path))
{
string line;
string[] row;
string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))"; //Should be commas that are not encapsulated in quotation marks
Regex r = new Regex(pattern);
while ((line = readFile.ReadLine()) != null)
{
row = r.Split(line);
parsedData.Add(row);
}
}
}
catch (Exception e)
{
MessageBox.Show(e.Message);
CommitSuicide();
}
return parsedData;
}
What is the best way to modify the existing function to only split at newlines that aren't enclosed in double quotes? I suppose I could create a StreamReader extension and call it ReadEntry and basically recreate what ReadLine already does... but that sounds rather tedious and out of my skill level, to be honest.