Here are some things which an make your working with C# regular expression patterns much easier in reading and processing in C#.

Suggestion 1 – Avoid C# Text Escape Pollution

When working in C# it can become confusing when one has to deal with string literals and escapes even before dealing with the regular expression escapes. For example if we have an escape such as word boundary in regex (\b) we have to escape the escape in C# such as

string pattern = "\\b";

That gets confusing because we don’t want have to deal with C#…we are working in regex and \\b does not mean what we think it is (though it gets sent to the parse appropriately). What we should do is use C# literal convention (@) infront of the string such as

string pattern = @"\b";

The two shown C# lines are functionally equivalent…but now we can concentrate on the regex pattern with no pollution from C# escapes.

Suggestion 2 – Use Regex Ascii Escapes for quotes

Some people will go out of there way to use double quotes (@” “” “) in C# to search for a double quote in a regex pattern. This is confusing try using the Regex Ascii escape pattern instead. Below is a code sample that is equivalent:

string pattern = @""""; // I am only searching for a quote
 
// VS
 
string pattern2 = @"\x22"; // Much better
 

Note if you are using Expresso as your regex editor, it provides a handy way of finding those escapes:

image

Suggestion 3 – Use the IgnorePatternWhitespace option

This option confuses beginning regex-ers because they think it applies to what the pattern does….when in fact it is a preprocessing instruction for the regex parser solely! What it does is it allows you to put space in a pattern and have it hang over lines for easier reading. Here is a sample I created for a forum post where I was able to break out a long pattern. Thereby commenting it and making it easier to read. without the IgnorePatternWhitespace option, one would have to remove the comments and make it all one line:

string text =
@"5:16:04.859 PM:  07:18:12p  2.33   0.45   NH4                      9558    WORK
5:16:06.000 PM:  07:18:13p  2.29   0.31   RIN                     10554    WORK
5:16:07.625 PM:  07:18:15p  2.33   0.44   NH4                      9645    WORK
5:16:09.125 PM:  07:18:16p  2.29   0.32   RIN                     10400    WORK";

 
string pattern =
@"^(?<Time1>[^\s]*)  # Start of line, capture first time and place into Time1
   (?:\s*)           # Match but don't capture (MBDC) the space (Used as an anchor)
   (?<AmPm1>[AP]M)   # Get the AM | PM and put it into AmPm1 capture group.
   (?:\:\s*)         # MBDC : and space
   (?<Time2>[^ap]*)  # Time 2 Capture
   (?<AmPm2>[ap])    # AmPm capture
   (?:\s*)
   (?<Col1>[^\s]*)   # Data column 1
   (?:\s*)
   (?<Col2>[^\s]*)   # Data column 2
   (?:\s*)
   (?<Col3>[^\s]*)   # Data column 3
   (?:\s*)
   (?<Col4>[^\s]*)   # Data column 4
   (?:\s*)
   (?<Col5>[^\s]*)   # Data column 5";

 
Regex rgx = new Regex(pattern,
                  RegexOptions.Multiline | // ^ and $ match Beginning and EOL.
                  RegexOptions.IgnorePatternWhitespace); // Allows us to do the comments.
 
 
string[] groupNames = rgx.GetGroupNames();
 
Console.WriteLine("Groups: ({0}){1}", string.Join(") (", groupNames), System.Environment.NewLine);
 
MatchCollection mc = rgx.Matches(text);
 
foreach (Match m in mc)
    if (m.Success)
    {
        Console.WriteLine("Match:");
        foreach (string name in groupNames)
            Console.WriteLine("{0,10} : {1}", name, m.Groups[name]);
 
        Console.WriteLine("{0}Time1 ({1}) Time2 ({2}){0}",
            System.Environment.NewLine,
            m.Groups["AmPm1"].Value,
            ( ( m.Groups["AmPm2"].Value == "a" ) ? "AM" : "PM" ));
    }
 
 
 
Share