CSV to .Net Dictionary using Regular Expressions

This is the process or parse CSV data and place it into a dictionary for storage in C# in .Net 2 and above. The dictionary will have a key of the actual line number, zero based, and a list of the data items with the commas and quotes removed. The regex pattern can handle data that looks like this

“xxx” or “xxx,xxx” or ‘xxx’ or ‘xxx,xxx’ or xxx

It will then be placed into a dictionary where each line is the actual row line. The below regex is designed to do these things

  1. Each match represents one data line or row.
  2. Each of the data items are inserted into the column capture to keep the data consistent with the current match.
  3. Handles both the single and double quote.
  4. The pattern can handle the comma within the quotes.
  5. The pattern uses an if condition see my blog entitled Regular Expressions and the If Conditional.
  6. Use of named capture group Column will hold the data.
Regex rx = new Regex(
@"((?([\x27\x22])         # Regex If single/double quotes 
   (?:[\x27\x22])         # \\x27\\x22 are single/double quotes
   (?<Column>[^\x27\x22]*)# Match this in the quotes
(?:[\\x27\\x22])
|
(?<Column>[^,\r\n]*]*))   # Else Not within quotes
(?:,?))+                  # Either a comma or EOL
(?:$|[\r\n]{0,2})         # Handle EOL or EOB",
                 RegexOptions.IgnorePatternWhitespace);
Dictionary<int, List<string>> data
    = new Dictionary<int,List<string>>();
string text =
@"'1','01000000043','2','4',20061102
'2',333,444,'555'";
int lineNumber = 0;
foreach(Match m in rx.Matches(text))
    if (m.Success)
    {
        List<string> line = new List<string>();
        foreach (Capture cp in m.Groups["Column"].Captures)
            if (string.IsNullOrEmpty(cp.Value) == false)
                line.Add(cp.Value);
        if (line.Count > 0)
            data.Add(lineNumber++, line);
    }
    foreach (KeyValuePair<int, List<string>> kvp in data)
        Console.WriteLine("Line {0} : {1}",
            kvp.Key.ToString(),
            string.Join(" ", kvp.Value.ToArray()));
Console Output

Line 0 : 1 01000000043 2 4 20061102
Line 1 : 2 333 444 555

Share

6 Comments

  1. treesprite: Re: Importing Decimal data from CSV truncates values says:

    Thanks for referring me to your site. I would be interested in trying to use your regex if you could supply an example in a C# console application?

    Thanks for your assistance!

  2. omegaman says:

    Put the above code in a static function and call it. Good Luck.

  3. me7486 says:

    Hi,
    Thanks for this very helpful Reg Expression. I would like to modify your expression a little to just parse one line of CSV at a time, I tried to do following but I always get a extra blank match at the end, I cannot figure out how to prevent it. Can you help me have a look?

    ((?(?=[\x22\x27])(?:[\x22\x27]+)(?[^\x22\x27]*)(?:[\x22\x27]+)|(?[^,\r\n]*))(?:,?))

    Thank you very much

  4. me7486 says:

    The previous question I posted is regarding how to handle empty column while maintaining column structure, e.g. you have “abc”,”efg”,,,”hij” the resulting list should be “abc” “efg” “” “” “hij”.

  5. me7486 says:

    Dont worry, I finally got it by using following expression

    (?(?=[\x22\x27])(?:[\x22\x27]+)(?[^\x22\x27]*)(?:[\x22\x27]+)(?:,?)|((?[^,\r\n]+)(?:,?)|(?\W*)(?:,)))

    Thank you

  6. Alfetta159 says:

    Try this:

    “22nd Street Caltrain”,”22nd Street Caltrain”,”1149 22nd Street,San Francisco”,”37.757674″,”-122.392636″,”1″,

Leave a Reply