CSV to .Net Dictionary using Regular Expressions
This is the process or parse CSV data and place it into a dictionary for storage in C# in .Net 2 and above. The dictionary will have a key of the actual line number, zero based, and a list of the data items with the commas and quotes removed. The regex pattern can handle data that looks like this
“xxx” or “xxx,xxx” or ‘xxx’ or ‘xxx,xxx’ or xxx
It will then be placed into a dictionary where each line is the actual row line. The below regex is designed to do these things
- Each match represents one data line or row.
- Each of the data items are inserted into the column capture to keep the data consistent with the current match.
- Handles both the single and double quote.
- The pattern can handle the comma within the quotes.
- The pattern uses an if condition see my blog entitled Regular Expressions and the If Conditional.
- Use of named capture group Column will hold the data.
Regex rx = new Regex(
@"((?([\x27\x22]) # Regex If single/double quotes
(?:[\x27\x22]) # \\x27\\x22 are single/double quotes
(?<Column>[^\x27\x22]*)# Match this in the quotes
(?:[\\x27\\x22])
|
(?<Column>[^,\r\n]*]*)) # Else Not within quotes
(?:,?))+ # Either a comma or EOL
(?:$|[\r\n]{0,2}) # Handle EOL or EOB",
RegexOptions.IgnorePatternWhitespace);
Dictionary<int, List<string>> data
= new Dictionary<int,List<string>>();
string text =
@"'1','01000000043','2','4',20061102
'2',333,444,'555'";
int lineNumber = 0;
foreach(Match m in rx.Matches(text))
if (m.Success)
{
List<string> line = new List<string>();
foreach (Capture cp in m.Groups["Column"].Captures)
if (string.IsNullOrEmpty(cp.Value) == false)
line.Add(cp.Value);
if (line.Count > 0)
data.Add(lineNumber++, line);
}
foreach (KeyValuePair<int, List<string>> kvp in data)
Console.WriteLine("Line {0} : {1}",
kvp.Key.ToString(),
string.Join(" ", kvp.Value.ToArray()));
Thanks for referring me to your site. I would be interested in trying to use your regex if you could supply an example in a C# console application?
Thanks for your assistance!
Put the above code in a static function and call it. Good Luck.
Hi,
Thanks for this very helpful Reg Expression. I would like to modify your expression a little to just parse one line of CSV at a time, I tried to do following but I always get a extra blank match at the end, I cannot figure out how to prevent it. Can you help me have a look?
((?(?=[\x22\x27])(?:[\x22\x27]+)(?[^\x22\x27]*)(?:[\x22\x27]+)|(?[^,\r\n]*))(?:,?))
Thank you very much
The previous question I posted is regarding how to handle empty column while maintaining column structure, e.g. you have “abc”,”efg”,,,”hij” the resulting list should be “abc” “efg” “” “” “hij”.
Dont worry, I finally got it by using following expression
(?(?=[\x22\x27])(?:[\x22\x27]+)(?[^\x22\x27]*)(?:[\x22\x27]+)(?:,?)|((?[^,\r\n]+)(?:,?)|(?\W*)(?:,)))
Thank you
Try this:
“22nd Street Caltrain”,”22nd Street Caltrain”,”1149 22nd Street,San Francisco”,”37.757674″,”-122.392636″,”1″,