@BEeEAfeBrACCfAACXZC~B| 55595V 494V~B 6465574954}55ZZWZCCC 46946VBCP 4746VCB 649VCBWCP6464F-~C~CP4F4WP64F646WC~~ 6499}AZZZBZ-

Split.` Bamboo grows in sections. Each part is connected, but also separate. In a sense the stem is an array of segments. The forest here is dense.`In a string too` we often find parts. These are separated with a delimiter. We can split lines and words from a string based on chars, strings or newlines.`First example.` We examine the simplest Split method. We call Split on a string instance. This program splits on a single character. The array returned has 4 elements. `Char `char`The input string (which contains 4 words) is split on spaces. The result value from Split is a string array.`Foreach: `The foreach-loop loops over the array and displays each word. The string array can be used as any other.`Foreach `foreach`Multiple characters.` Next we use Regex.Split to separate based on multiple characters. There is an overloaded method if you need StringSplitOptions. This removes empty strings. `RemoveEmptyEntries.` Regex methods are used to effectively Split strings. But string Split is often faster. This example specifies an array as the first argument to Split(). `StringSplitOptions: `This is an enum. It does not need to be allocated with a constructor—it is more like a special int value.`Enum `enum`Char arrays.` The string Split method receives a character array as the first parameter. Each char in the array designates a new block in the string data. `Char Array `char-array`Using string arrays.` A string array can also be passed to the Split method. The new string array is created inline with the Split call. `String Array `array`RemoveEmptyEntries notes.` For StringSplitOptions, we specify RemoveEmptyEntries. When two delimiters are adjacent, we can end up with an empty result. `So: `We use RemoveEntryEmpties as the second parameter to avoid empty results. Here is the Visual Studio debugger.`Separate words.` You can separate words with Split. Usually, the best way to separate words is to use a Regex that specifies non-word chars. `Regex.Split `regex-split`This example separates words in a string based on non-word characters. It eliminates punctuation and whitespace.`Here we show how to separate parts of a string based on any character set or range with Regex.Split.`Warning: `Regex provides more power and control than the string Split methods. But the code is harder to read.`Text files.` Here we have a text file containing comma-delimited lines of values—a CSV file. We use File.ReadAllLines to read lines, but StreamReader can be used instead. `StreamReader `streamreader`Then: `It displays the values of each line after the line number. The output shows how the file was parsed into the strings.`Directory paths.` We can split the segments in a Windows local directory into separate strings. Please note that directory paths are complex. This code may not correctly handle all cases. `We could use Path.DirectorySeparatorChar, a char property in System.IO, for more flexibility.`Path `path`Internals.` What is inside Split? The logic internal to the .NET Framework for Split is implemented in managed code. Methods call into an overload with three parameters. `Next: `The parameters are checked for validity. It uses unsafe code to create a separator list, and a for-loop combined with Substring.`For `for`Benchmarks.` I tested two strings (with 40 and 1200 chars). Speed varied on the contents of strings. The length of blocks, number of delimiters, and total size factor into performance. `The Regex.Split option generally performed the worst. String.Split was consistently faster.`I felt that the second or third methods would be best. Regex also causes performance problems elsewhere.`Benchmark results.` For 1200-char strings, the speed difference is reduced. For short strings, Regex is slowest. For long strings it is fast. `Short strings: `For short, 40-char strings, the Regex method is by far the slowest. The compilation time may cause this.`Regex may also lack certain optimizations present with string.Split. Smaller is better.`Arrays: `In programs that use shorter strings, the methods that split based on arrays are faster. This avoids Regex compilation.`But: `For longer strings or files that contain more lines, Regex is appropriate.`Delimiter arrays.` Here we examine delimiter performance. My research finds it is worthwhile to declare, and allocate, the char array argument as a local variable. `Storing the array of delimiters outside the loop is faster. This version, shown second, is requires 10% less time.`StringSplitOptions.` This affects the behavior of Split. The two values of StringSplitOptions (None and RemoveEmptyEntries) are integers (enums) that tell Split how to work. `In this example, the input string contains five commas. These commas are the delimiters.`Two fields between commas are 0 characters long—they are empty. They are treated differently when we use RemoveEmptyEntries.`First call: `In the first call to Split, these fields are put into the result array. These elements equal string.Empty.`Second call: `We specify StringSplitOptions.RemoveEmptyEntries. The two empty fields are not in the result array.`Join.` With this method, we can combine separate strings with a separating delimiter. Join() can be used to round-trip data. It is the opposite of split. `Join `string-join`Replace.` Split does not handle escaped characters. We can instead use Replace on a string input to substitute special characters for any escaped characters. `Replace `replace`IndexOf, Substring.` Methods can be combined. Using IndexOf and Substring together is another way to split strings. This is sometimes more effective. `IndexOf `indexof`Substring `substring`StringReader.` This class can separate a string into lines. It can lead to performance improvements over using Split. The code required is often more complex. `StringReader `stringreader`A summary.` With Split, we separate strings. We solve problems. Split divides (separates) strings. And it keeps code as simple as possible.

VXZ XPXOXSXDX; { VX$V{ VVXL sXz"thereZ ZisZ ZaZ Zcat";Z VVX{Xb XL on spaces. VVX2This will separate all the words. VVZXL[] wordsXzs.ZXbZ(Z' 'Z); VVX7 (XL wordXkwords) VV{X'X%word); VV} V} } Z there is a catZ XPXO; XPXO.X#XSXDX; { VX$V{ VVXL XdXzZ"cat\r\ndog\r\nanimal\r\nperson"Z;Z VVX{Xb the XL on line breaks. VVX2The XJ Xd from Xb is a XL XT. VVZXL[] linesXzZXa.XbZ(Xd, Z"\r\n"Z)XSVVX7 (XL lineXklines) VV{X'X%line); VV} V} } Z cat dog animal personZ XPXOXSXDX; { VX$V{Z VVX2Parts are separated by Windows line breaks. VVZXL XdXzZ"shirt\r\ndress\r\npants\r\njacket"Z;Z VVX{Use a char XT of 2 cX0s (\rXR\n). VVX2Break lines X}o separate XLs. VVX2Use RXYEntryEntries so empty XLs are not added. VVZchar[] delimitersXzXwchar[] { Z'\r'Z, Z'\n'Z }; VVXL[] partsXzXd.ZXbZ(delimiters,X'VVV XMXbOptions.RXYEmptyEntries); VVX%Z":::SPLIT, CHAR ARRAY:::"Z); VVXo(XsiXz0; i < parts.LXZ; i++) VV{X'X%parts[i]); VV}Z VVX2Same but uses a XL of 2 cX0s. VVZXL[] partsFromXMXzXd.ZXbZ(X'XwXL[] { Z"\r\n"Z }, XMXbOptions.None); VVX%Z":::SPLIT, STRING:::"Z); VVXo(XsiXz0; i < parts.LXZ; i++) VV{X'X%parts[i]); VV} V} } Z Z:::SPLIT, CHAR ARRAY:::Z shirt dress pants jacket Z:::SPLIT, STRING:::Z shirt dress pants jacketZ XPXO; XPXO.X#XSXDX; { VX$V{ VVXL[] wXzXbWords(Z"That is a cute cat, man"Z); VVX7 (XL sXkw) VV{X'X%s); VV} VVX4.XxLine(); V}Z V/X{<XB> V/X{Take all the wordsXkthe input XLXRseparate them. V/X{</XB> VZXAXL[] XbWords(XL s) V{Z VV// VVX{Xb on all non-word cX0s. VVX2Returns an XT of all the words. VV// VVZXJ ZXa.XbZ(s, Z@"\W+"Z);Z VVX{@V special verbatim XL syntax VVX{\W+Vone or more non-word cX0s together VZ} } Z That is a cute cat manZContents of input file: TextFile1.txtZ Dog,Cat,Mouse,Fish,Cow,Horse,Hyena X;mer,Wizard,CEO,Rancher,Clerk,Farmer Z XPXO; XPXO.IOXSXDX; { VX$V{ VVXsiXz0; VVX7 (XL lineXkFile.XxAllLines(Z"TextFile1.txt"Z)) VV{X'XL[] partsXzline.ZXbZ(',');X'X7 (XL partXkparts)X'{X'VX%Z"{0}:{1}"Z,X'VVi,X'VVpart);X'}X'i++;Z X{Xpdemonstration. VVZ} V} } Z 0:Dog 0:Cat 0:Mouse 0:Fish 0:Cow 0:Horse 0:Hyena 1:X;mer 1:Wizard 1:CEO 1:Rancher 1:Clerk 1:FarmerZ XPXOXSXDX; { VX$V{Z VVX{The directory from Windows. VVZconst XL dirXzZ@"C:\Users\Sam\Documents\Perls\Xv"Z;Z VVX{Xb on directory separator. VVZXL[] partsXzdir.ZXbZ('\\'); VVX7 (XL partXkparts) VV{X'X%part); VV} V} } Z C: Users Sam Documents Perls XvZXMs usedXktest: C#Z Z// X{Build long XL. // Z_testXzXL.Empty; Xo(XsiXz0; i < 120; i++) { V_test += Z"01234567\r\n"Z; }Z // X{Build short XL. // Z_testXzXL.Empty; Xo(XsiXz0; i < 10; i++) { V_test += Z"ab\r\n"Z; } ZMX[s tested: 100000 iterationsZ XAXfTest1() { VXL[] arrXzXa.Xb(_test, "\r\n", XaOptions.Compiled); } XAXfTest2() { VXL[] arrXz_test.Xb(Xwchar[] { '\r', '\n' },X'VVVV XMXbOptions.RXYEmptyEntries); } XAXfTest3() { VXL[] arrXz_test.Xb(XwXL[] { "\r\n" },X'VVVV XMXbOptions.None); }ZBenchmark of Xb on long XLsZ [1] Xa.Xb:V3470 ms [2] char[] Xb: Z1255 msZ [fastest] [3] XL[] Xb: 1449 ms ZBenchmark of Xb on short XLsZ [1] Xa.Xb:V 434 ms [2] char[] Xb: Z 63 msZ [fastest] [3] XL[] Xb: 83 msZSlow version, beX|e: C#Z Z// X{Xb on multiple cX0s XPXwchar[] inline. // ZXL tXzZ"XLXlsplit, ok"ZXSXo(XsiXz0; i < 10000000; i++) { VXL[] sXzt.Xb(Xwchar[] { ' ', ',' }); } ZFast version, after: C#Z Z// X{Xb on multiple cX0s XPXwchar[] already cX`d. // ZXL tXzZ"XLXlsplit, ok"Z; char[] cXzXwchar[]{ ' ', ',' };Z X{<-- Cache this ZXo(XsiXz0; i < 10000000; i++) { VXL[] sXzt.Xb(c); }Z XPXOXSXDX; { VX$V{Z VVX{Input XL contain separators. VVZXL Xd1XzZ"man,woman,child,,,bird"Z; VVchar[] delimiter1XzXwchar[] { ',' };Z X{<-- Xb on these VVX2Use XMXbOptions.None. VVZXL[] XT1XzXd1.ZXbZ(delimiter1,X'XMXbOptions.None)XSVVX7 (XL entryXkXT1) VV{X'X%entry); VV}Z VVX2Use XMXbOptions.RXYEmptyEntries. VVZXL[] XT2XzXd1.ZXbZ(delimiter1,X'XMXbOptions.RXYEmptyEntries)XSVVX%); VVX7 (XL entryXkXT2) VV{X'X%entry); VV} V} } Z Zman woman child V V bird man woman child bird Z

0splits on spaces:splits on lines with Regex=splits on multiple characters=separates on non-word pattern4splits lines in file:splits Windows directoriesbStringSplitOptions