C# Split

Array Collections File Keyword String .NET Cast Class Data Dictionary Enum Exception If Interface Lambda LINQ List Loop Method Number Process Property Regex Sort Split StringBuilder Struct Substring Switch Time Windows

Split

Split. Often strings have delimiter characters in their data. Delimiters include "\r\n" newline sequences and the comma and tab characters.


String face

A string method, Split() separates at string and character delimiters. Even if we want just one part from a string, Split is useful. It returns a string array.


Elements

To begin, we examine the simplest Split method. We call Split on a string instance. This program splits on a single character. The array returned has four elements.

Char

Here:The input string, which contains four words, is split on spaces. The result value from Split is a string array.

Foreach:The foreach-loop loops over this array and displays each word. The string array can be used as any other.

Foreach
Based on:

.NET 4.5

C# program that splits on spaces

using System;

class Program
{
    static void Main()
    {
	string s = "there is a cat";
	// Split string on spaces.
	// ... This will separate all the words.
	string[] words = s.Split(' ');
	foreach (string word in words)
	{
	    Console.WriteLine(word);
	}
    }
}

Output

there
is
a
cat
Split strings

Multiple characters. Next we use Regex.Split to separate based on multiple characters. There is an overloaded method if you need StringSplitOptions. This removes empty strings.

C# program that splits on lines with Regex

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	string value = "cat\r\ndog\r\nanimal\r\nperson";
	// Split the string online breaks.
	// ... The return value from Split is a string array.
	string[] lines = Regex.Split(value, "\r\n");

	foreach (string line in lines)
	{
	    Console.WriteLine(line);
	}
    }
}

Output

cat
dog
animal
person
Example

RemoveEmptyEntries. Regex methods are used to effectively Split strings. But string Split is often faster. The next example specifies an array as the first argument to string Split.

StringSplitOptions:This is an enum. It does not need to be allocated with a constructor—it is more like a special int value.

C# program that splits on multiple characters

using System;

class Program
{
    static void Main()
    {
	// This string is also separated by Windows line breaks.
	string value = "shirt\r\ndress\r\npants\r\njacket";

	// Use a new char array of two characters (\r and \n).
	// ... Breaks lines into separate strings.
	// ... Use RemoveEntryEntries to make sure not empty strings are added.
	char[] delimiters = new char[] { '\r', '\n' };
	string[] parts = value.Split(delimiters,
				     StringSplitOptions.RemoveEmptyEntries);
	for (int i = 0; i < parts.Length; i++)
	{
	    Console.WriteLine(parts[i]);
	}

	// Same as the previous example, but uses a string of 2 characters.
	parts = value.Split(new string[] { "\r\n" }, StringSplitOptions.None);
	for (int i = 0; i < parts.Length; i++)
	{
	    Console.WriteLine(parts[i]);
	}
    }
}

Output
    (Repeated two times)

shirt
dress
pants
jacket
Char type

Char arrays. The string Split method receives a character array as the first parameter. Each char in the array designates a new block in the string data.

Char Array
Array

Using string arrays. A string array can also be passed to the Split method. The new string array is created inline with the Split call.

String Array
Locals: String split result array

RemoveEmptyEntries notes. For StringSplitOptions, we specify RemoveEmptyEntries. When two delimiters are adjacent, we can end up with an empty result.

So:We use RemoveEntryEmpties as the second parameter to avoid empty results. Here is the Visual Studio debugger.


Regex: regular expression

Separate words. You can separate words with Split. Usually, the best way to separate words is to use a Regex that specifies non-word chars.

Regex.Split

Here:This example separates words in a string based on non-word characters. It eliminates punctuation and whitespace.

Note:Here we show how to separate parts of a string based on any character set or range with Regex.Split.

Warning:Regex provides more power and control than the string Split methods. But the code is harder to read.

C# program that separates on non-word pattern

using System;
using System.Text.RegularExpressions;

class Program
{
    static void Main()
    {
	string[] w = SplitWords("That is a cute cat, man");
	foreach (string s in w)
	{
	    Console.WriteLine(s);
	}
	Console.ReadLine();
    }

    /// <summary>
    /// Take all the words in the input string and separate them.
    /// </summary>
    static string[] SplitWords(string s)
    {
	//
	// Split on all non-word characters.
	// ... Returns an array of all the words.
	//
	return Regex.Split(s, @"\W+");
	// @      special verbatim string syntax
	// \W+    one or more non-word characters together
    }
}

Output

That
is
a
cute
cat
man
CSV file

Text files. Here we have a text file containing comma-delimited lines of values—a CSV file. We use File.ReadAllLines to read lines, but StreamReader can be used instead.

StreamReader

Then:It displays the values of each line after the line number. The output shows how the file was parsed into the strings.

Contents of input file: TextFile1.txt

Dog,Cat,Mouse,Fish,Cow,Horse,Hyena
Programmer,Wizard,CEO,Rancher,Clerk,Farmer

C# program that splits lines in file

using System;
using System.IO;

class Program
{
    static void Main()
    {
	int i = 0;
	foreach (string line in File.ReadAllLines("TextFile1.txt"))
	{
	    string[] parts = line.Split(',');
	    foreach (string part in parts)
	    {
		Console.WriteLine("{0}:{1}",
		    i,
		    part);
	    }
	    i++; // For demonstration.
	}
    }
}

Output

0:Dog
0:Cat
0:Mouse
0:Fish
0:Cow
0:Horse
0:Hyena
1:Programmer
1:Wizard
1:CEO
1:Rancher
1:Clerk
1:Farmer
Path type

Directory paths. We can split the segments in a Windows local directory into separate strings. Please note that directory paths are complex. This code may not correctly handle all cases.

Tip:We could use Path.DirectorySeparatorChar, a char property in System.IO, for more flexibility.

Path
C# program that splits Windows directories

using System;

class Program
{
    static void Main()
    {
	// The directory from Windows.
	const string dir = @"C:\Users\Sam\Documents\Perls\Main";
	// Split on directory separator.
	string[] parts = dir.Split('\\');
	foreach (string part in parts)
	{
	    Console.WriteLine(part);
	}
    }
}

Output

C:
Users
Sam
Documents
Perls
Main
Net

Internals. What is inside Split? The logic internal to the .NET Framework for Split is implemented in managed code. Methods call into an overload with three parameters.

Next:The parameters are checked for validity. It uses unsafe code to create a separator list, and a for-loop combined with Substring.

For
Performance

Benchmarks. I tested two strings (with 40 and 1200 chars). Speed varied on the contents of strings. The length of blocks, number of delimiters, and total size factor into performance.

Note:The Regex.Split option generally performed the worst. String.Split was consistently faster.

And:I felt that the second or third methods would be best. Regex also causes performance problems elsewhere.

Strings used in test: C#

//
// Build long string.
//
_test = string.Empty;
for (int i = 0; i < 120; i++)
{
    _test += "01234567\r\n";
}
//
// Build short string.
//
_test = string.Empty;
for (int i = 0; i < 10; i++)
{
    _test += "ab\r\n";
}

Methods tested: 100000 iterations

static void Test1()
{
    string[] arr = Regex.Split(_test, "\r\n", RegexOptions.Compiled);
}

static void Test2()
{
    string[] arr = _test.Split(new char[] { '\r', '\n' },
			       StringSplitOptions.RemoveEmptyEntries);
}

static void Test3()
{
    string[] arr = _test.Split(new string[] { "\r\n" },
			       StringSplitOptions.None);
}
Cover logo

Benchmark results. For 1200-char strings, the speed difference is reduced. For short strings, Regex is slowest. For long strings it is fast.

Short strings:For short, 40-char strings, the Regex method is by far the slowest. The compilation time may cause this.

And:Regex may also lack certain optimizations present with string.Split. Smaller is better.

Arrays:In programs that use shorter strings, the methods that split based on arrays are faster. This avoids Regex compilation.

But:For longer strings or files that contain more lines, Regex is appropriate.

Benchmark of Split on long strings

[1] Regex.Split:    3470 ms
[2] char[] Split:   1255 ms [fastest]
[3] string[] Split: 1449 ms

Benchmark of Split on short strings

[1] Regex.Split:     434 ms
[2] char[] Split:     63 ms [fastest]
[3] string[] Split:   83 ms

Delimiter arrays. For delimiters, my further research shows that it is worthwhile to declare your char array you are splitting on as a local instance.

Note:We see that storing the array of delimiters outside the loop is good. This version, shown second, is 10% faster.

Slow version, before: C#

//
// Split on multiple characters using new char[] inline.
//
string t = "string to split, ok";

for (int i = 0; i < 10000000; i++)
{
    string[] s = t.Split(new char[] { ' ', ',' });
}

Fast version, after: C#

//
// Split on multiple characters using new char[] already created.
//
string t = "string to split, ok";
char[] c = new char[]{ ' ', ',' }; // <-- Cache this

for (int i = 0; i < 10000000; i++)
{
    string[] s = t.Split(c);
}

StringSplitOptions. This affects the behavior of Split. The two values of StringSplitOptions (None and RemoveEmptyEntries) are actually just integers that tell Split how to work.

Note:In this example, the input string contains five commas. These commas are the delimiters.

And:Two fields between commas are 0 characters long—they are empty. They are treated differently when we use RemoveEmptyEntries.

First call:In the first call to Split, these fields are put into the result array. These elements equal string.Empty.

Second call:We specify StringSplitOptions.RemoveEmptyEntries. The two empty fields are not in the result array.

C# that uses StringSplitOptions

using System;

class Program
{
    static void Main()
    {
	// Input string contain separators.
	string value1 = "man,woman,child,,,bird";
	char[] delimiter1 = new char[] { ',' };   // <-- Split on these

	// ... Use StringSplitOptions.None.
	string[] array1 = value1.Split(delimiter1,
	    StringSplitOptions.None);

	foreach (string entry in array1)
	{
	    Console.WriteLine(entry);
	}

	// ... Use StringSplitOptions.RemoveEmptyEntries.
	string[] array2 = value1.Split(delimiter1,
	    StringSplitOptions.RemoveEmptyEntries);

	Console.WriteLine();
	foreach (string entry in array2)
	{
	    Console.WriteLine(entry);
	}
    }
}

Output

man  
woman
child
     
     
bird 

man  
woman
child
bird 
Join objects together

Join. With this method, we can combine separate strings with a separating delimiter. Join() can be used to round-trip data. It is the opposite of split.

Join
Replace

Replace. Split does not handle escaped characters. We can instead use Replace on a string input to substitute special characters for any escaped characters.

Replace
Index

IndexOf, Substring. Methods can be combined. Using IndexOf and Substring together is another way to split strings. This is sometimes more effective.

IndexOfSubstring
Squares: grey

StringReader. This class can separate a string into lines. It can lead to performance improvements over using Split. The code required is often more complex.

StringReader

A summary. With Split, we separate strings. We solve problems. Split divides (separates) strings. And it keeps code as simple as possible.

C#