Are C# .Net Regular Expressions Fast Enough for You?
It is generally accepted that there is an overhead in using regular expression parsing and there is truth to that statement. But the premise of this article is that the difference is really negligible and if its an excuse to not learn regex pattern processing because of that, well that is just plain foolish. Just like any high level language programming construct which gives the developer a quicker development time, the price paid is in extra cycles it takes to complete it. But is the perception that usage of regular expressions are really that slow? Let me show you by example….
The MSDN forums are littered with the vague warnings “Don’t use regex, its slow”. I have seen that advice given and yes its based on a truth as mentioned before, but they never add in the time it takes to subsequently process the information. They forget that in most cases Regular Expressions already provides the post processing needs such storage and data extraction abilities built in.
It comes down to…Is it fast enough for you?
If one needs to shave off milliseconds from a multi-million operation, then don’t use regular expressions or at least do tests first. But for day to day use, I believe its always the right answer. With that premise, let us test some code.
Premise
The usual contender for a regular expression is string.Split. Now string.Split is a fast little function and very useful, but one has to then consider the ancillary processing and I have found a real example culled from the forums.
The Test
A user asked what could be used to parse specific text and whether regular expressions could be used. The example text, changed slightly, the value 41 was used instead of 0, looked like this
name="rating_count" value="41"
The user was interested in achieving the value of 41 as an integer and wondered which is better.
The Opponent
Right out of the gate there was an answer saying Regex is slower and gave an example which actually failed. I have modified it to work. The originator had tested zero and didn’t realize they were getting a default value instead of an extracted value because it was only splitting on the ‘=’ character. In my test it is fixed and placed into a static method called Highway:
public static int Highway(string text) { string []parts = text.Split( new char[] { ' ', '=', '\x22' }, StringSplitOptions.RemoveEmptyEntries ); int value = 0; for(int index = 0; index < parts.Length-1; index++) if(parts[index].ToLower() == "value") { string tempValue = parts[index+1]; int.TryParse(tempValue, out value); break; } return value; }
Note that \x22 is hex for quotes(“).
The Contender
Here is what I wrote to do the same job in Regular Expressions which I called MyWay (get it MyWay or the Highway…bwhahahaha…nevermind)
public static int MyWay( string text ) { int value = 0; int.TryParse( Regex.Match( text, "(?:value=\x22)([^\x22]+)", RegexOptions.Compiled ).Groups[1].Value, out value ); return value; }
Now I knew that this would be run multiple times so I told .Net to compile the expression for future uses after the first, but if this is a one off operation one should not do that.
The Cage
Here is the testing arena for the two operations. I throw away the first value, which does help regex in the long run due to the compilation, but frankly a one off test without the compilation flag is not to shabby. If you try this at home don’t forget System.Diagnostics using.
string data = string.Format("name={0}rating_count{0} value={0}41{0}", "\x22"); Stopwatch st = new Stopwatch(); int index; int totalRuns = 100000; Highway( data ); // Do a test and throw it out st.Start(); for (index = 0; index < totalRuns; index++) Highway( data ); st.Stop(); Console.WriteLine( "Non Regex:\t{0}\tAvg Per Run:\t{1}", st.Elapsed.TotalMilliseconds, st.Elapsed.TotalMilliseconds / totalRuns ); MyWay( data ); // Throw out the first st.Reset(); st.Start(); for ( index = 0; index < totalRuns; index++ ) MyWay( data ); st.Stop(); Console.WriteLine( "Regex:\t\t{0}\tAvg Per Run:\t{1}", st.Elapsed.TotalMilliseconds, st.Elapsed.TotalMilliseconds / totalRuns );
Results
So what happens? Well in Release mode for 100000 times produces results like this result on a dual core machine (Total Milliseconds values):
Non Regex: 213.9509 Avg Per Run: 0.002139509 Regex: 226.7564 Avg Per Run: 0.002267564
So the difference was not really that great…and though the times for the non regex were usually faster overall, there wasn’t too great of a difference between the two.
So one has to ask, “Is Regex fast enough for you?”
I believe that to be yes! Note, in fairness, poorly formed regex patterns will slow the parser down, but garbage in garbage out; so yes your mileage will vary.
Versatility verses speed. Regex wins hands-down with versatility. And it looks like speed isn’t to shabby. Thanks for the legwork.
Excellent analysis of such a widely ignored and dismissed topic which many consider “esoteric”. Regex is the text pattern standard for text searching and parsing.
You hit the nail on the head because I was one who ignored it for years until I finally buckled down and began learning it. Now my views have been opened up to a whole new way to do things a lot more simpler and elegantly.
Thank you William.
Thanks for the comparison. I completely agree with the sentiment. I wanted to see how much worse uncompiled Regex would perform. I ran the same comparison at 10 different random times throughout the morning and got these averaged results.
Non Regex: 199.9204 Avg Per Run: 0.001999204
Regex Compiled: 242.9104 Avg Per Run: 0.002429104
Regex Uncompiled: 326.9396 Avg Per Run: 0.003269396
Hi William. Great post as usual, I always wondered about benchmarks comparing RegEx solutions vs regular string operations, and the result is amazing. I totally agree that regular expressions are the way to go in almost every common scenario.
Anyways, I’m never satisfied with just evaluating results if I don’t analyze the process by which they are obtained. Well, at least when I understand the process. So after some copy & pasting, I got the same results as you did, but there was something that came into my attention, and that was the way you compared the strings in “Highway”. In my experience, ToLower or ToUpper comparisons are way slower than using StringComparison.InvariantCultureIgnoreCase, so I went ahead and changed it. As expected, there was a slight improvement, but the conclusion I got after that was more important: Highway is case-insensitive and MyWay isn’t. Therefor I went ahead and changed Highway to be case sensitive as well and now the results are pretty different. Highway takes less than half the time as MyWay! I’d thought it would be fair for the string class that I shared my results ;)
Well, having said all that, still the answer to you question is: Yes! It definitely is fast enough for me. Even with the broader difference in performance. A better use and understanding of regular expressions has been in my todo list like forever. Unless performance is explicitly required, I always favor simplicity and versatility over performance, and this is exactly that.
Thanks Fernando! I will re-evaluate the test. The code did come from someone else and I didn’t look at it accordingly; my bad.
But it would make sense, now that I look at the code, for dealing with case does add time in both Regex and non regex operations. So sounds like it is time for another test!
change highway to
public static int Highway(string text)
{
int value = 0;
int indexOf = text.IndexOf(“value=\””);
if (indexOf >= 0)
{
indexOf += “value=\””.Length;
int endIndex = text.IndexOf(“\””, indexOf);
int.TryParse(text.Substring(indexOf, (endIndex > 0 ? endIndex – indexOf : 0)), out value);
}
return value;
}
and give it a try now.
On my machine, regex is just over 3x slower
Non Regex: 38.5628 Avg Per Run: 0.000385628
Regex: 122.995 Avg Per Run: 0.00122995
I’m not saying don’t use the regex though; especially for this example. I just don’t like how often people jump to support regex.
Your example here is an invalid use of the RegexOptions.Compiled flag. If you’re going to compile a Regex, you should put it in a static readonly class member. That way it is only compiled once rather than every time it is used. (Use the instance Match method rather than the static Match method.)