C# Regex Type

Dot Net Perls

Array Collections File String Windows VB.NET Algorithm ASP.NET Cast Class Compression Convert Data Delegate Directive Enum Exception If Interface Keyword LINQ Loop Method .NET Number Regex Sort StringBuilder Struct Switch Time Value

Regex type

The Regex type uses patterns to search and replace string data. It is built upon a special text-processing language. While Regex methods streamline certain programs, they render other programs more complex. Not only this, but Regex in the C# language typically decreases performance.

These expressions are commonly used to describe patterns. Regular expressions are built from single characters, using union, concatenation, and the Kleene closure, or any-number-of, operator. Aho et al., p. 187

Match

This program introduces the Regex type, including its constructor and the Match method, as well as the Match type—all of which are found in the System.Text.RegularExpressions namespace. The Regex uses a pattern of one or more digits; the characters "55" match this pattern.

Program that uses Regex type [C#]

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	Regex regex = new Regex(@"\d+");
	Match match = regex.Match("Dot 55 Perls");
	if (match.Success)
	{
	    Console.WriteLine(match.Value);
	}
    }
}

Output

55

Details. The Regex.Match method is one of the most useful ones on the Regex type. We describe its use with some example patterns that were useful in the real world.

Regex.Match Regex.Matches Regex.Matches Quote Example Regex.IsMatch Regex Capture

Replace

Question and answer

What if you have to replace a certain pattern of text with some other text? The Regex.Replace method solves this problem well: you can replace strings that match a pattern with a simple string, or with a value that is determined through a computation with MatchEvaluator.

Regex.Replace Regex.Replace Spaces Regex.Replace String End Regex.Replace Numbers

Split

Split strings

Do you need to extract substrings from your text that contain only certain characters, such as certain digits or letters? The Split method returns a string array that will contain the matching substrings; its usage solves complicated text problems.

Regex.Split Regex.Split Numbers

Escape

The Escape method on the Regex type can be used to change a user input to a valid Regex pattern: the method assumes no metacharacters were intended, and the input string should be literal characters only.

Regex.Escape Regex.Unescape

Files

You will often need to process text files from the disk. The Regex type and its methods can definitely be used for this, but you will need to combine a file input method with the Regex code.

Regex File Tutorial

HTML

Title element in HTML

The Regex type can be used to process or extract parts of HTML strings. The examples linked to here show how you can pull the title or the contents of paragraphs in your HTML documents. You can also remove all HTML tags, although this can be problematic.

Title From HTML Paragraph HTML Regex Remove HTML Tags

Case-sensitivity

Lowercase and uppercase words

Lowercase and uppercase letters are distinct in the Regex text language. You can, however, use a RegexOptions enumerated constant to change the machine's behavior so that the letters 'A' and 'a' are treated as equal.

RegexOptions.IgnoreCase

Whitespace

Whitespace isn't actually white, but it is often not needed for future processing of data. We demonstrate how you can Trim whitespace using Regex methods; this is an alternative to the string methods.

Regex Trim Example

Newlines. You can change how the Regex type acts upon newlines using the RegexOptions newline. This is one of the most useful options.

RegexOptions.MultiLineStar (asterisk) character

Star character

What does the star character in Regex patterns do? The star is also known as a Kleene closure in language theory. It is important to know the difference between the star (*) and the plus (+).

Star (Regex)

Word counts

Another usage of the Regex type in the C# language is to count words in strings. We show how you can implement a word count method that is very close to that present in Microsoft Word 2007. This works on English text.

Word Count

Performance

Performance optimization

Are Regex matches fast? Unfortunately, Regex usage often results in slower code than imperative loops. We reveal ways to optimize Regex performance. Alternatively, we replace Regex with a switch construct.

RegexOptions.Compiled Regex Performance Regex Versus Loop Static Regex

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. Jamie Zawinski

Summary

.NET Framework information

In the .NET Framework, regular expressions are a concise way to process text data, but this comes at a cost. Every Regex call can be re-implemented as a more-efficient, low-level method that processes characters. This is because a Regex is a high-level representation of that same low-level logic.