C# Split

Split

Split separates strings. Often strings have delimiter characters in their data. Delimiters include "\r\n" newline sequences and the comma and tab characters. Split handles splitting upon string and character delimiters.

Tip:Use Split to separate parts from a string. If your input is "A B C", split on the space to get an array of "A", "B" and "C".

Array

Example

Method

To begin, we examine the simplest Split method overload. You already know the general way to do this, but it is good to see the basic syntax. This program splits on a single character. The array returned has four elements.

Char

Here:The input string, which contains four words, is split on spaces. The result value from Split is a string array.

Then:The foreach-loop loops over this array and displays each word. The string array can be used as any other.

C# program that splits on spaces

using System;

class Program
{
    static void Main()
    {
	string s = "there is a cat";
	//
	// Split string on spaces.
	// ... This will separate all the words.
	//
	string[] words = s.Split(' ');
	foreach (string word in words)
	{
	    Console.WriteLine(word);
	}
    }
}

Output

there
is
a
cat

Multiple characters

Split strings

Next we use Regex.Split to separate based on multiple characters. A new char array is created in the following usages. There is an overloaded method if you need StringSplitOptions. This removes empty strings.

C# program that splits on lines with Regex

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	string value = "cat\r\ndog\r\nanimal\r\nperson";
	//
	// Split the string on line breaks.
	// ... The return value from Split is a string array.
	//
	string[] lines = Regex.Split(value, "\r\n");

	foreach (string line in lines)
	{
	    Console.WriteLine(line);
	}
    }
}

Output

cat
dog
animal
person

RemoveEmptyEntries

Regex methods are used to effectively Split strings. But string Split is often faster. The next example specifies an array as the first argument to string Split. It uses RemoveEmptyEntries.

C# program that splits on multiple characters

using System;

class Program
{
    static void Main()
    {
	//
	// This string is also separated by Windows line breaks.
	//
	string value = "shirt\r\ndress\r\npants\r\njacket";

	//
	// Use a new char[] array of two characters (\r and \n) to break
	// lines from into separate strings. Use "RemoveEmptyEntries"
	// to make sure no empty strings get put in the string array.
	//
	char[] delimiters = new char[] { '\r', '\n' };
	string[] parts = value.Split(delimiters,
				     StringSplitOptions.RemoveEmptyEntries);
	for (int i = 0; i < parts.Length; i++)
	{
	    Console.WriteLine(parts[i]);
	}

	//
	// Same as the previous example, but uses a new string of 2 characters.
	//
	parts = value.Split(new string[] { "\r\n" }, StringSplitOptions.None);
	for (int i = 0; i < parts.Length; i++)
	{
	    Console.WriteLine(parts[i]);
	}
    }
}

Output
    (Repeated two times)

shirt
dress
pants
jacket
Char type

One useful overload of Split receives char arrays. The string Split method receives a character array as the first parameter. Each char in the array designates a new block in the string data.

Char ArrayArray

Using string arrays. Another overload of Split receives string arrays. This means a string array can also be passed to the Split method. The new string array is created inline with the Split call.

String Array

For StringSplitOptions, we specify RemoveEmptyEntries. When two delimiters are adjacent, we end up with an empty result. We use this as the second parameter to avoid empty results. This screenshot shows the Visual Studio debugger.

Split string debug screenshot

Separate words

Regex: regular expression

You can separate words with Split. Usually, the best way to separate words is to use a Regex that specifies non-word chars. This example separates words in a string based on non-word characters. It eliminates punctuation and whitespace.

Note:In the example, we show how to separate parts of your string based on any character set or range with Regex.

Warning:This overall provides more power than the string Split methods. But the code is harder to read.

Regex.Split
C# program that separates on non-word pattern

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	string[] w = SplitWords("That is a cute cat, man");
	foreach (string s in w)
	{
	    Console.WriteLine(s);
	}
	Console.ReadLine();
    }

    /// <summary>
    /// Take all the words in the input string and separate them.
    /// </summary>
    static string[] SplitWords(string s)
    {
	//
	// Split on all non-word characters.
	// ... Returns an array of all the words.
	//
	return Regex.Split(s, @"\W+");
	// @      special verbatim string syntax
	// \W+    one or more non-word characters together
    }
}

Output

That
is
a
cute
cat
man

Text files

CSV file

Here you have a text file containing comma-delimited lines of values—a CSV file. We use File.ReadAllLines, but you may want StreamReader instead. This code reads in both of those lines. It parses them.

Then:It displays the values of each line after the line number. The output shows how the file was parsed into the strings.

Contents of input file: TextFile1.txt

Dog,Cat,Mouse,Fish,Cow,Horse,Hyena
Programmer,Wizard,CEO,Rancher,Clerk,Farmer

C# program that splits lines in file

using System;
using System.IO;

class Program
{
    static void Main()
    {
	int i = 0;
	foreach (string line in File.ReadAllLines("TextFile1.txt"))
	{
	    string[] parts = line.Split(',');
	    foreach (string part in parts)
	    {
		Console.WriteLine("{0}:{1}",
		    i,
		    part);
	    }
	    i++; // For demonstration.
	}
    }
}

Output

0:Dog
0:Cat
0:Mouse
0:Fish
0:Cow
0:Horse
0:Hyena
1:Programmer
1:Wizard
1:CEO
1:Rancher
1:Clerk
1:Farmer

Directory paths

Path type

You can Split the segments in a Windows local directory into separate strings. Please note that directory paths are complex and this may not correctly handle all cases. It is also platform-specific.

Tip:You could use Path.DirectorySeparatorChar, a char property in System.IO, for more flexibility.

Path
C# program that splits Windows directories

using System;

class Program
{
    static void Main()
    {
	// The directory from Windows
	const string dir = @"C:\Users\Sam\Documents\Perls\Main";
	// Split on directory separator
	string[] parts = dir.Split('\\');
	foreach (string part in parts)
	{
	    Console.WriteLine(part);
	}
    }
}

Output

C:
Users
Sam
Documents
Perls
Main

Internals

Framework: NET

What is inside Split? The logic internal to the .NET Framework for Split is implemented in managed code. The methods call into the overload with three parameters. The parameters are checked for validity.

Next:It uses unsafe code to create a separator list, and a for-loop combined with Substring.

ForSubstring

Benchmarks

Performance

I tested a long string and a short string, having 40 and 1200 chars. String splitting speed varies on the type of strings. The length of the blocks, number of delimiters, and total size of the string factor into performance.

Note:The Regex.Split option generally performed the worst. String.Split was consistently faster.

And:I felt that the second or third methods would be best. Regex also causes performance problems elsewhere.

Strings used in test: C#

//
// Build long string.
//
_test = string.Empty;
for (int i = 0; i < 120; i++)
{
    _test += "01234567\r\n";
}
//
// Build short string.
//
_test = string.Empty;
for (int i = 0; i < 10; i++)
{
    _test += "ab\r\n";
}

Methods tested: 100000 iterations

static void Test1()
{
    string[] arr = Regex.Split(_test, "\r\n", RegexOptions.Compiled);
}

static void Test2()
{
    string[] arr = _test.Split(new char[] { '\r', '\n' },
			       StringSplitOptions.RemoveEmptyEntries);
}

static void Test3()
{
    string[] arr = _test.Split(new string[] { "\r\n" },
			       StringSplitOptions.None);
}

For 1200 char strings, the benchmark results are more even.
It may be that for long strings,
such as entire files,
the Regex method is equivalent
or even faster. For short strings Regex is slowest. For long strings it is fast.

Benchmark of Split on long strings

[1] Regex.Split:    3470 ms
[2] char[] Split:   1255 ms [fastest]
[3] string[] Split: 1449 ms

Benchmark of Split on short strings

[1] Regex.Split:     434 ms
[2] char[] Split:     63 ms [fastest]
[3] string[] Split:   83 ms

For 40 char strings, the Regex method is by far the slowest on the short strings. The compilation time may cause this. Regex may also lack certain optimizations present with string.Split. Smaller is better.

Performance optimization

In programs that use shorter strings, the methods that split based on arrays are faster. This avoids Regex compilation.
For longer strings
or files that contain more lines,
Regex is appropriate.

Delimiter arrays

For delimiters, my further research shows that it is worthwhile to declare your char array you are splitting on as a local instance. This reduces memory pressure. It improves runtime performance.

Note:We see that storing the array of delimiters outside the loop is good. This version, shown second, is 10% faster.

Slow version, before: C#

//
// Split on multiple characters using new char[] inline.
//
string t = "string to split, ok";

for (int i = 0; i < 10000000; i++)
{
    string[] s = t.Split(new char[] { ' ', ',' });
}

Fast version, after: C#

//
// Split on multiple characters using new char[] already created.
//
string t = "string to split, ok";
char[] c = new char[]{ ' ', ',' }; // <-- Cache this

for (int i = 0; i < 10000000; i++)
{
    string[] s = t.Split(c);
}

StringSplitOptions

Question

What effect does the StringSplitOptions argument have? It affects the behavior of the Split method. The two values of StringSplitOptions (None and RemoveEmptyEntries) are actually just integers that tell Split how to work.

C# that uses StringSplitOptions

using System;

class Program
{
    static void Main()
    {
	// Input string contain separators.
	string value1 = "man,woman,child,,,bird";
	char[] delimiter1 = new char[] { ',' };   // <-- Split on these

	// ... Use StringSplitOptions.None.
	string[] array1 = value1.Split(delimiter1,
	    StringSplitOptions.None);

	foreach (string entry in array1)
	{
	    Console.WriteLine(entry);
	}

	// ... Use StringSplitOptions.RemoveEmptyEntries.
	string[] array2 = value1.Split(delimiter1,
	    StringSplitOptions.RemoveEmptyEntries);

	Console.WriteLine();
	foreach (string entry in array2)
	{
	    Console.WriteLine(entry);
	}
    }
}

Output

man  
woman
child
     
     
bird 

man  
woman
child
bird 
String type

In this example, the input string contains five commas, These commas are the delimiters. And two fields between commas are 0 characters long—they are empty. They are treated differently when we use RemoveEmptyEntries.

First call:In the first call to Split, these fields are put into the result array. These elements equal string.Empty.

Second call:We specify StringSplitOptions.RemoveEmptyEntries. The two empty fields are not in the result array.

Discussion

Cover logo

Split does not handle escaped characters. You can instead use Replace on your string input to substitute special characters for any escaped characters. This helps parse computer-generated data.

Replace

StringReader. We can instead use the StringReader type to separate a string into lines. StringReader can also lead to performance improvements over using Split. No arrays are allocated. The code required is often more complex.

StringReader

Summary

C# programming language

With Split() we separated strings and solved problems. Split helps divide or separate your strings. And it keeps your code as simple as possible—free of custom, perhaps flawed, parsing algorithms.

Tip:Methods can be combined. Using IndexOf and Substring together is another way to split strings. This is sometimes more effective.

IndexOf

C#: String