This is the process or parse CSV data and place it into a dictionary for storage in C# in .Net 2 and above. The dictionary will have a key of the actual line number, zero based, and a list of the data items with the commas and quotes removed. The regex pattern can handle data that looks like this

“xxx” or “xxx,xxx” or ‘xxx’ or ‘xxx,xxx’ or xxx

It will then be placed into a dictionary where each line is the actual row line. The below regex is designed to do these things

  1. Each match represents one data line or row.
  2. Each of the data items are inserted into the column capture to keep the data consistent with the current match.
  3. Handles both the single and double quote.
  4. The pattern can handle the comma within the quotes.
  5. The pattern uses an if condition see my blog entitled Regular Expressions and the If Conditional.
  6. Use of named capture group Column will hold the data.
Regex rx = new Regex(
@"((?([\x27\x22])         # Regex If single/double quotes 
   (?:[\x27\x22])         # \\x27\\x22 are single/double quotes
   (?<Column>[^\x27\x22]*)# Match this in the quotes
(?:[\\x27\\x22])
|
(?<Column>[^,\r\n]*]*))   # Else Not within quotes
(?:,?))+                  # Either a comma or EOL
(?:$|[\r\n]{0,2})         # Handle EOL or EOB",
                 RegexOptions.IgnorePatternWhitespace);
Dictionary<int, List<string>> data
    = new Dictionary<int,List<string>>();
string text =
@"'1','01000000043','2','4',20061102
'2',333,444,'555'";
int lineNumber = 0;
foreach(Match m in rx.Matches(text))
    if (m.Success)
    {
        List<string> line = new List<string>();
        foreach (Capture cp in m.Groups["Column"].Captures)
            if (string.IsNullOrEmpty(cp.Value) == false)
                line.Add(cp.Value);
        if (line.Count > 0)
            data.Add(lineNumber++, line);
    }
    foreach (KeyValuePair<int, List<string>> kvp in data)
        Console.WriteLine("Line {0} : {1}",
            kvp.Key.ToString(),
            string.Join(" ", kvp.Value.ToArray()));
Console Output

Line 0 : 1 01000000043 2 4 20061102
Line 1 : 2 333 444 555

Share