(Updated 3/26/2011 works with .Net 1-4)
I recently answered a post by a user where he did not understand why one pattern returned a match but the second one did not. What this person did not understand was that the second pattern did succeed, but the match he was looking for had actually five matches from where he wanted it!
That was due to the fact that the regex parser was faithfully reporting the null matches. In this post I will discuss why the Null comes up in .Net regular expression parsing and a suggestion to Microsoft to have a regular expression flag such as IgnoreNullMatchesAndCaptures explained.

Example of Null Matches

The original issue can be boiled down to this, he had a regex pattern of

(b*)
Now the * in the pattern says zero or more instances and that is at the crux of the issue. When that pattern is brought against data such as
ab
what happens is there are three matches. If one looks at the matches (as defined in my blog .Net Regex Capture Groups) returned, this is what is shown, note 0 and 1 under the matches are the group (match capture) indexes:
Match (1):
0 :
1 :
Match (2):
0 : b
1 : b
Match (3):
0 :
1 :
Match index zero (not shown) of the match is always the whole match.
This is what the parser is doing: The first match, the parser is looking at the a. Because a is not b, but the pattern of b’s * specifies zero or more; bingo there is a null match. The same is true for match 3, which I believe matches off the end of the buffer.(?). The second match works because B is B…no explanation needed.
Now I know the proper way to get around this is to change it to (b+) and there will not be a return of null matches. But I present this problem in the scope of a larger regex pattern where one may have grouping issues which capture a null inadvertently. Also sometimes when doing a match across \r\n boundaries…sometimes one gets a null or the end of buffer null…It is that where I think the below suggestion would be useful.

Suggestion Made to Microsoft

I suggested to Microsoft in a connect issue entitled Regular Expression (Regex) Improvements – Null Value Ignore to create a flag on the parser that would ignore all nulls. If there was nothing in the match or captures, then the match would not even be presented. In the above example, if my suggestion was applied only one match would be returned. I have never used the null in regex parsing…maybe a reader could enlighten me…but for the most part if there are nulls, don’t report them. Your thoughts?
If you find it compelling go to the connect issue and vote on it! Thanks.
Share