DQbJA%Py{dP9Pyd@Pp'DQDQ$bJA%Phy{dP9KhdPUPyYdhb@Pp'DQbJA%9Phy{U/V9zP9PyqPyhdOd'wiy^'9P/POyhdqPOd'wiy^'DQDQ$bJA%Pyd@Pp'5u{7{pPV{7?PdP{d/9UKYd{P{/ADQDQbJA%iy@PpuPyd@Pp'{sADQbJA%{Pyr{dPyd@Pp'rOp{PyPwiy{PyPwiyg?cPyYdY?cPydqOd?cPydqPOddPYddPddPYddPd|{d/DqPyPjbwiyPydq{d/Dq_PyPjyq{wiyPydDQbJA%{PPhyyq{d9OdPUyhdOdb@PpU'9OdPUyhdOdb'@PpU'aABbEEeEAfeBrACCfAACXZC~B| 55595V 494V~B 6465574954}55ZZWZCCC 46946VBCP 4746VCB 649VCBWCP6464F-~C~CP4F4WP64F646WC~~ 6499}AZZZBZ-

Split.` Bamboo grows in sections. Each part is connected, but also separate. In a sense the stem is an array of segments. The forest here is dense.`In a string too` we often find parts. These are separated with a delimiter. We can split lines and words from a string based on chars, strings or newlines.`First example.` We examine the simplest Split method. We call Split on a string instance. This program splits on a single character. The array returned has 4 elements. `Char `char`The input string (which contains 4 words) is split on spaces. The result value from Split is a string array.`Foreach: `The foreach-loop loops over the array and displays each word. The string array can be used as any other.`Foreach `foreach`Multiple characters.` Next we use Regex.Split to separate based on multiple characters. There is an overloaded method if you need StringSplitOptions. This removes empty strings. `RemoveEmptyEntries.` Regex methods are used to effectively Split strings. But string Split is often faster. This example specifies an array as the first argument to Split(). `StringSplitOptions: `This is an enum. It does not need to be allocated with a constructor—it is more like a special int value.`Enum `enum`Char arrays.` The string Split method receives a character array as the first parameter. Each char in the array designates a new block in the string data. `Char Array `char-array`Using string arrays.` A string array can also be passed to the Split method. The new string array is created inline with the Split call. `String Array `array`RemoveEmptyEntries notes.` For StringSplitOptions, we specify RemoveEmptyEntries. When two delimiters are adjacent, we can end up with an empty result. `So: `We use RemoveEntryEmpties as the second parameter to avoid empty results. Here is the Visual Studio debugger.`Separate words.` You can separate words with Split. Usually, the best way to separate words is to use a Regex that specifies non-word chars. `Regex.Split `regex-split`This example separates words in a string based on non-word characters. It eliminates punctuation and whitespace.`Here we show how to separate parts of a string based on any character set or range with Regex.Split.`Warning: `Regex provides more power and control than the string Split methods. But the code is harder to read.`Text files.` Here we have a text file containing comma-delimited lines of values—a CSV file. We use File.ReadAllLines to read lines, but StreamReader can be used instead. `StreamReader `streamreader`Then: `It displays the values of each line after the line number. The output shows how the file was parsed into the strings.`Directory paths.` We can split the segments in a Windows local directory into separate strings. Please note that directory paths are complex. This code may not correctly handle all cases. `We could use Path.DirectorySeparatorChar, a char property in System.IO, for more flexibility.`Path `path`Internals.` What is inside Split? The logic internal to the .NET Framework for Split is implemented in managed code. Methods call into an overload with three parameters. `Next: `The parameters are checked for validity. It uses unsafe code to create a separator list, and a for-loop combined with Substring.`For `for`Benchmarks.` I tested two strings (with 40 and 1200 chars). Speed varied on the contents of strings. The length of blocks, number of delimiters, and total size factor into performance. `The Regex.Split option generally performed the worst. String.Split was consistently faster.`I felt that the second or third methods would be best. Regex also causes performance problems elsewhere.`Benchmark results.` For 1200-char strings, the speed difference is reduced. For short strings, Regex is slowest. For long strings it is fast. `Short strings: `For short, 40-char strings, the Regex method is by far the slowest. The compilation time may cause this.`Regex may also lack certain optimizations present with string.Split. Smaller is better.`Arrays: `In programs that use shorter strings, the methods that split based on arrays are faster. This avoids Regex compilation.`But: `For longer strings or files that contain more lines, Regex is appropriate.`Delimiter arrays.` Here we examine delimiter performance. My research finds it is worthwhile to declare, and allocate, the char array argument as a local variable. `Storing the array of delimiters outside the loop is faster. This version, shown second, is requires 10% less time.`StringSplitOptions.` This affects the behavior of Split. The two values of StringSplitOptions (None and RemoveEmptyEntries) are integers (enums) that tell Split how to work. `In this example, the input string contains five commas. These commas are the delimiters.`Two fields between commas are 0 characters long—they are empty. They are treated differently when we use RemoveEmptyEntries.`First call: `In the first call to Split, these fields are put into the result array. These elements equal string.Empty.`Second call: `We specify StringSplitOptions.RemoveEmptyEntries. The two empty fields are not in the result array.`Join.` With this method, we can combine separate strings with a separating delimiter. Join() can be used to round-trip data. It is the opposite of split. `Join `string-join`Replace.` Split does not handle escaped characters. We can instead use Replace on a string input to substitute special characters for any escaped characters. `Replace `replace`IndexOf, Substring.` Methods can be combined. Using IndexOf and Substring together is another way to split strings. This is sometimes more effective. `IndexOf `indexof`Substring `substring`StringReader.` This class can separate a string into lines. It can lead to performance improvements over using Split. The code required is often more complex. `StringReader `stringreader`A summary.` With Split, we separate strings. We solve problems. Split divides (separates) strings. And it keeps code as simple as possible.

VXZ XXXXX { VXV{ VVX sX"thereZ ZisZ ZaZ Zcat";Z VVXX X on spaces. VVXThis will separate all the words. VVZX[] wordsXs.ZXZ(Z' 'Z); VVX (X wordXwords) VV{ VVVXword); VV} V} } Z there is a catZ XX; XX.XXXX { VXV{ VVX XXZ"cat\r\ndog\r\nanimal\r\nperson"Z;Z VVXX the X on line breaks. VVXThe X X from X is a X X. VVZX[] linesXZX.XZ(X, Z"\r\n"Z)XVVX (X lineXlines) VV{ VVVXline); VV} V} } Z cat dog animal personZ XXXXX { VXV{Z VVXParts are separated by Windows line breaks. VVZX XXZ"shirt\r\ndress\r\npants\r\njacket"Z;Z VVXUse a char X of 2 cXs (\rX\n). VVXBreak lines Xo separate Xs. VVXUse RemoveEntryEntries so empty Xs are not added. VVZchar[] delimitersXXchar[] { Z'\r'Z, Z'\n'Z }; VVX[] partsXX.ZXZ(delimiters, VVVVVV XXOptions.RemoveEmptyEntries); VVXZ":::SPLIT, CHAR ARRAY:::"Z); VVX(XiX0; i < parts.LX; i++) VV{ VVVXparts[i]); VV}Z VVXSame but uses a X of 2 cXs. VVZX[] partsFromXXX.ZXZ( VVVXX[] { Z"\r\n"Z }, XXOptions.None); VVXZ":::SPLIT, STRING:::"Z); VVX(XiX0; i < parts.LX; i++) VV{ VVVXparts[i]); VV} V} } Z Z:::SPLIT, CHAR ARRAY:::Z shirt dress pants jacket Z:::SPLIT, STRING:::Z shirt dress pants jacketZ XX; XX.XXXX { VXV{ VVX[] wXXWords(Z"That is a cute cat, man"Z); VVX (X sXw) VV{ VVVXs); VV} VVX.XLine(); V}Z V/X<X> V/XTake all the wordsXthe input XXseparate them. V/X</X> VZXX[] XWords(X s) V{Z VV// VVXX on all non-word cXs. VVXReturns an X of all the words. VV// VVZX ZX.XZ(s, Z@"\W+"Z);Z VVX@V special verbatim X syntax VVX\W+Vone or more non-word cXs together VZ} } Z That is a cute cat manZContents of input file: TextFile1.txtZ Dog,Cat,Mouse,Fish,Cow,Horse,Hyena Xmer,Wizard,CEO,Rancher,Clerk,Farmer Z XX; XX.IOXXX { VXV{ VVXiX0; VVX (X lineXFile.XAllLines(Z"TextFile1.txt"Z)) VV{ VVVX[] partsXline.ZXZ(','); VVVX (X partXparts) VVV{ VVVVXZ"{0}:{1}"Z, VVVVVi, VVVVVpart); VVV} VVVi++;Z XXdemonstration. VVZ} V} } Z 0:Dog 0:Cat 0:Mouse 0:Fish 0:Cow 0:Horse 0:Hyena 1:Xmer 1:Wizard 1:CEO 1:Rancher 1:Clerk 1:FarmerZ XXXXX { VXV{Z VVXThe directory from Windows. VVZconst X dirXZ@"C:\Users\Sam\Documents\Perls\X"Z;Z VVXX on directory separator. VVZX[] partsXdir.ZXZ('\\'); VVX (X partXparts) VV{ VVVXpart); VV} V} } Z C: Users Sam Documents Perls XZXs usedXtest: C#Z Z// XBuild long X. // Z_testXX.Empty; X(XiX0; i < 120; i++) { V_test += Z"01234567\r\n"Z; }Z // XBuild short X. // Z_testXX.Empty; X(XiX0; i < 10; i++) { V_test += Z"ab\r\n"Z; } ZMXs tested: 100000 iterationsZ XXTest1() { VX[] arrXX.X(_test, "\r\n", XOptions.Compiled); } XXTest2() { VX[] arrX_test.X(Xchar[] { '\r', '\n' }, VVVVVVV XXOptions.RemoveEmptyEntries); } XXTest3() { VX[] arrX_test.X(XX[] { "\r\n" }, VVVVVVV XXOptions.None); }ZBenchmark of X on long XsZ [1] X.X:V3470 ms [2] char[] X: Z1255 msZ [fastest] [3] X[] X: 1449 ms ZBenchmark of X on short XsZ [1] X.X:V 434 ms [2] char[] X: Z 63 msZ [fastest] [3] X[] X: 83 msZSlow version, beXe: C#Z Z// XX on multiple cXs XXchar[] inline. // ZX tXZ"XXsplit, ok"ZXX(XiX0; i < 10000000; i++) { VX[] sXt.X(Xchar[] { ' ', ',' }); } ZFast version, after: C#Z Z// XX on multiple cXs XXchar[] already cXd. // ZX tXZ"XXsplit, ok"Z; char[] cXXchar[]{ ' ', ',' };Z X<-- Cache this ZX(XiX0; i < 10000000; i++) { VX[] sXt.X(c); }Z XXXXX { VXV{Z VVXInput X contain separators. VVZX X1XZ"man,woman,child,,,bird"Z; VVchar[] delimiter1XXchar[] { ',' };Z X<-- X on these VVXUse XXOptions.None. VVZX[] X1XX1.ZXZ(delimiter1, VVVXXOptions.None)XVVX (X entryXX1) VV{ VVVXentry); VV}Z VVXUse XXOptions.RemoveEmptyEntries. VVZX[] X2XX1.ZXZ(delimiter1, VVVXXOptions.RemoveEmptyEntries)XVVX); VVX (X entryXX2) VV{ VVVXentry); VV} V} } Z Zman woman child V V bird man woman child bird Z

'0:==4:bsplits on spacessplits on lines with Regexsplits on multiple charactersseparates on non-word patternsplits lines in filesplits Windows directoriesStringSplitOptions