C# Regex Performance

Performance

Regex performance is important. It can be improved by using Regex as a field on classes. Another option is to use RegexOptions.Compiled. Avoiding static Regex calls also helps. There are many ways to optimize Regex calls.

Benchmark results

Static Regex method:     6895 ms
Instance Regex object:   6583 ms
Instance compiled Regex: 5679 ms [fastest]

Example

One: 1

First we use the static Regex.Split method in System.Text.RegularExpressions. For the next three examples, we use Split, but other methods such as Matches, Match, and Replace have similar characteristics.

Here:This code uses the static Regex.Split method. Static methods are slower when storing state would save CPU cycles.

Regex.SplitStatic Method

And:It shows a simple Regex that Splits the input string into separate words. The \W+ means one or more non-word characters.

Program that uses Regex.Split: C#

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	string s = "This is a simple /string/ for Regex.";
	string[] c = Regex.Split(s, @"\W+");
	foreach (string m in c)
	{
	    Console.WriteLine(m);
	}

    }
}

Output

This
is
a
simple
string
for
Regex

Example 2

Here we see faster approach than the above example. This example creates an expression with new Regex. It works the same, but has better performance. It stores the Regex as a method-level instance.

Program that uses instance Regex: C#

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	string s = "This is a simple /string/ for Regex.";
	Regex r = new Regex(@"\W+");
	string[] c = r.Split(s);
	foreach (string m in c)
	{
	    Console.WriteLine(m);
	}
    }
}

Output

This
is
a
simple
string
for
Regex

Example 3

Programming tip

Next, we use a compiled regular expression, and store it at the class level. We see two new approaches here. The Regex is stored as a static field, meaning it can be reused throughout the application without recreating it.

RegexOptions.Compiled
Program that uses static compiled Regex: C#

using System;
using System.Text.RegularExpressions;

class Program
{
    static Regex _wordRegex = new Regex(@"\W+", RegexOptions.Compiled);

    static void Main()
    {
	string s = "This is a simple /string/ for Regex.";
	string[] c = _wordRegex.Split(s);
	foreach (string m in c)
	{
	    Console.WriteLine(m);
	}
    }
}

Output

This
is
a
simple
string
for
Regex

Benchmark

Performance optimization

We check the performance characteristics of the regular expressions. The three Regex method calls above are compared here in one million iterations on the same method-level objects in the three examples.

Note:You can see the figures from the experiment above. The benchmark code is not available.

Discussion

Squares

Let's review some of the other work done by experts in the C# language and MSDN's resources. Microsoft's David Gutierrez states that there are three major options for regular expression performance.

The first option. First is interpreted regular expressions. The runtime parses the Regex into opcodes and then uses the interpreter. Creation time is low, and runtime performance is low.

Second is compiled. Here you use RegexOptions.Compiled. Takes 10x longer to startup, but yields 30% better runtime. Don't use for dynamically-generated Regexes. Creation time is highest, and runtime performance is high.

Finally:We see precompiled (Regex.CompileToAssembly). This is harder to set up. Creation time is low, and runtime performance is high.

BCL Team Blog

MSDN

Note

We look at MSDN, which has little documentation here. It warns not to use RegexOptions.Compiled when also using CompileToAssembly. This means you can't combine compiled and precompiled code.

RegexOptions: MSDN

Summary

C# programming language

We optimized Regex.Split regular expressions. We encountered a situation where runtime performance can be enhanced by sacrificing startup time. There are many performance options for the Regex type.

Therefore:Using an instance method that is not compiled is best for most situations. It doesn't cost much during program startup.


C#: Regex