C# Regex.Split

Split strings

Regex.Split separates strings based on a pattern. It handles a delimiter specified as a pattern—such as \D+ which means non-digit characters. This yields a greater level of flexibility and power than string.Split.

Split

Example

Method call

First, we get all numbers in a string, and then actually parse them into integers for easier usage in a C# program. The important part of the example is that it splits on all non-digit values in the string.

Then:It loops through the result strings, with a foreach-loop, and uses int.TryParse.

Foreachint.TryParse
Program that uses Regex.Split: C#

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	//
	// String containing numbers.
	//
	string sentence = "10 cats, 20 dogs, 40 fish and 1 programmer.";
	//
	// Get all digit sequence as strings.
	//
	string[] digits = Regex.Split(sentence, @"\D+");
	//
	// Now we have each number string.
	//
	foreach (string value in digits)
	{
	    //
	    // Parse the value to get the number.
	    //
	    int number;
	    if (int.TryParse(value, out number))
	    {
		Console.WriteLine(value);
	    }
	}
    }
}

Output

10
20
40
1
Regex type

In this example, the input string contains the numbers 10, 20, 40 and 1, and the static Regex.Split method is called with two parameters. The string @"\D+" is a verbatim string literal that designates all NON-digit characters.

Regex.Split NumbersStatic Method

Tip:When a regex pattern has an escaped uppercase letter like \D, it means NOT.

Example 2

Steps

Here we extract all substrings in a string that are separated by whitespace characters. You could also use string.Split. But this version is simpler and can also be more easily extended.

Note:The example gets all operands and operators from an equation string. An operand is a character like * that acts on operands.

Program that tokenizes: C#

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	//
	// The equation.
	//
	string operation = "3 * 5 = 15";
	//
	// Split it on whitespace sequences.
	//
	string[] operands = Regex.Split(operation, @"\s+");
	//
	// Now we have each token.
	//
	foreach (string operand in operands)
	{
	    Console.WriteLine(operand);
	}
    }
}

Output

3
*
5
=
15

In this program, we implemented a simple tokenizer. Computer programs and languages first undergo lexical analysis and tokenization. This step gets all the tokens such as those shown in the output above.

Token

Info:This is an effective way to parse computer languages or program output. It is not the fastest way.

Example 3

Lowercase and uppercase words

Here we look at a method that gets all the words that have an initial uppercase letter in a string. The Regex.Split call used actually just gets all the words. The loop checks the first letter for its case.

Tip:It is often useful to combine regular expressions and manual looping and string operations. Programs are not art projects.

Program that collects uppercase words: C#

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	//
	// String containing uppercased words.
	//
	string sentence = "Bob and Michelle are from Indiana.";
	//
	// Get all words.
	//
	string[] uppercaseWords = Regex.Split(sentence, @"\W");
	//
	// Get all uppercased words.
	//
	var list = new List<string>();
	foreach (string value in uppercaseWords)
	{
	    //
	    // Check the word.
	    //
	    if (!string.IsNullOrEmpty(value) &&
		char.IsUpper(value[0]))
	    {
		list.Add(value);
	    }
	}
	//
	// Write all proper nouns.
	//
	foreach (var value in list)
	{
	    Console.WriteLine(value);
	}
    }
}

Output

Bob
Michelle
Indiana

Discussion

Squares: abstract

For performance you may want to try using the string Split method, which is an instance method on the string type, instead of regular expressions. That method is more appropriate for precise and predictable input.

Also:You can change the Regex.Split method call into an instance Regex. This enhances performance and reduces memory pressure.

Regex Performance

Further:You can use the RegexOptions.Compiled enumerated constant for greater performance.

RegexOptions.Compiled

Summary

The C# programming language

We extracted strings with the Regex.Split method, using patterns of non-digit characters, whitespace characters, and non-word characters. We processed the string array result of Regex.Split by parsing the integers in a sentence.

Tip:Using loops on the results of Regex.Split is an easy way to further filter your results.

Loop Constructs: For, While and Foreach

C#: Regex