aAbEEeEAfeBrACCfAACXZC~B| 55595V 494V~B 6465574954}55ZZWZCCC 46946VBCP 4746VCB 649VCBWCP6464F-~C~CP4F4WP64F646WC~~ 6499}AZZZBZ-

Split.` Bamboo grows in sections. Each part is connected, but also separate. In a sense the stem is an array of segments. The forest here is dense.`In a string too` we often find parts. These are separated with a delimiter. We can split lines and words from a string based on chars, strings or newlines.`First example.` We examine the simplest Split method. We call Split on a string instance. This program splits on a single character. The array returned has 4 elements. `Char `char`The input string (which contains 4 words) is split on spaces. The result value from Split is a string array.`Foreach: `The foreach-loop loops over the array and displays each word. The string array can be used as any other.`Foreach `foreach`Multiple characters.` Next we use Regex.Split to separate based on multiple characters. There is an overloaded method if you need StringSplitOptions. This removes empty strings. `RemoveEmptyEntries.` Regex methods are used to effectively Split strings. But string Split is often faster. This example specifies an array as the first argument to Split(). `StringSplitOptions: `This is an enum. It does not need to be allocated with a constructor—it is more like a special int value.`Enum `enum`Char arrays.` The string Split method receives a character array as the first parameter. Each char in the array designates a new block in the string data. `Char Array `char-array`Using string arrays.` A string array can also be passed to the Split method. The new string array is created inline with the Split call. `String Array `array`RemoveEmptyEntries notes.` For StringSplitOptions, we specify RemoveEmptyEntries. When two delimiters are adjacent, we can end up with an empty result. `So: `We use RemoveEntryEmpties as the second parameter to avoid empty results. Here is the Visual Studio debugger.`Separate words.` You can separate words with Split. Usually, the best way to separate words is to use a Regex that specifies non-word chars. `Regex.Split `regex-split`This example separates words in a string based on non-word characters. It eliminates punctuation and whitespace.`Here we show how to separate parts of a string based on any character set or range with Regex.Split.`Warning: `Regex provides more power and control than the string Split methods. But the code is harder to read.`Text files.` Here we have a text file containing comma-delimited lines of values—a CSV file. We use File.ReadAllLines to read lines, but StreamReader can be used instead. `StreamReader `streamreader`Then: `It displays the values of each line after the line number. The output shows how the file was parsed into the strings.`Directory paths.` We can split the segments in a Windows local directory into separate strings. Please note that directory paths are complex. This code may not correctly handle all cases. `We could use Path.DirectorySeparatorChar, a char property in System.IO, for more flexibility.`Path `path`Internals.` What is inside Split? The logic internal to the .NET Framework for Split is implemented in managed code. Methods call into an overload with three parameters. `Next: `The parameters are checked for validity. It uses unsafe code to create a separator list, and a for-loop combined with Substring.`For `for`Benchmarks.` I tested two strings (with 40 and 1200 chars). Speed varied on the contents of strings. The length of blocks, number of delimiters, and total size factor into performance. `The Regex.Split option generally performed the worst. String.Split was consistently faster.`I felt that the second or third methods would be best. Regex also causes performance problems elsewhere.`Benchmark results.` For 1200-char strings, the speed difference is reduced. For short strings, Regex is slowest. For long strings it is fast. `Short strings: `For short, 40-char strings, the Regex method is by far the slowest. The compilation time may cause this.`Regex may also lack certain optimizations present with string.Split. Smaller is better.`Arrays: `In programs that use shorter strings, the methods that split based on arrays are faster. This avoids Regex compilation.`But: `For longer strings or files that contain more lines, Regex is appropriate.`Delimiter arrays.` Here we examine delimiter performance. My research finds it is worthwhile to declare, and allocate, the char array argument as a local variable. `Storing the array of delimiters outside the loop is faster. This version, shown second, is requires 10% less time.`StringSplitOptions.` This affects the behavior of Split. The two values of StringSplitOptions (None and RemoveEmptyEntries) are integers (enums) that tell Split how to work. `In this example, the input string contains five commas. These commas are the delimiters.`Two fields between commas are 0 characters long—they are empty. They are treated differently when we use RemoveEmptyEntries.`First call: `In the first call to Split, these fields are put into the result array. These elements equal string.Empty.`Second call: `We specify StringSplitOptions.RemoveEmptyEntries. The two empty fields are not in the result array.`Join.` With this method, we can combine separate strings with a separating delimiter. Join() can be used to round-trip data. It is the opposite of split. `Join `string-join`Replace.` Split does not handle escaped characters. We can instead use Replace on a string input to substitute special characters for any escaped characters. `Replace `replace`IndexOf, Substring.` Methods can be combined. Using IndexOf and Substring together is another way to split strings. This is sometimes more effective. `IndexOf `indexof`Substring `substring`StringReader.` This class can separate a string into lines. It can lead to performance improvements over using Split. The code required is often more complex. `StringReader `stringreader`A summary.` With Split, we separate strings. We solve problems. Split divides (separates) strings. And it keeps code as simple as possible.

VXKQqKKXDXQXbXJXAK{KVX%V{KVVXP sXyQthereq qisq qaq qcatQ;qKVVX{Xd XP on spaces.KVVX9This will separate all the words.KVVqXP[] wordsXys.qXdq(q' 'q);KVVX@ (XP wordXpwords)KVV{KVVVX'word);KVV}KV}K}KKqKKthereKisKaKcatqKKXDXQ;KXDXQ.X$XbXJXAK{KVX%V{KVVXP XhXyqQcat\r\ndog\r\nanimal\r\npersonQq;qKVVX{Xd the XP on line breaks.KVVX9The XK Xh from Xd is a XP XU.KVVqXP[] linesXyqXY.Xdq(Xh, qQ\r\nQq)XbVVX@ (XP lineXplines)KVV{KVVVX'line);KVV}KV}K}KKqKKcatKdogKanimalKpersonqKKXDXQXbXJXAK{KVX%V{qKVVX9Parts are separated by Windows line breaks.KVVqXP XhXyqQshirt\r\ndress\r\npants\r\njacketQq;qKKVVX{Use a char XU of 2 cX/s (\rXV\n).KVVX9Break lines Xzo separate XPs.KVVX9Use RemoveEntryEntries so empty XPs are not added.KVVqchar[] delimitersXyXqchar[] { q'\r'q, q'\n'q };KVVXP[] partsXyXh.qXdq(delimiters,KVVVVVV XOXdOptions.RemoveEmptyEntries);KVVX'qQ:::SPLIT, CHAR ARRAY:::Qq);KVVXw(XiiXy0; i < parts.LX^; i++)KVV{KVVVX'parts[i]);KVV}qKKVVX9Same but uses a XP of 2 cX/s.KVVqXP[] partsFromXOXyXh.qXdq(KVVVXqXP[] { qQ\r\nQq }, XOXdOptions.None);KVVX'qQ:::SPLIT, STRING:::Qq);KVVXw(XiiXy0; i < parts.LX^; i++)KVV{KVVVX'parts[i]);KVV}KV}K}KKqKKq:::SPLIT, CHAR ARRAY:::qKshirtKdressKpantsKjacketKq:::SPLIT, STRING:::qKshirtKdressKpantsKjacketqKKXDXQ;KXDXQ.X$XbXJXAK{KVX%V{KVVXP[] wXyXdWords(qQThat is a cute cat, manQq);KVVX@ (XP sXpw)KVV{KVVVX's);KVV}KVVX5.XuLine();KV}qKKV/X{<X7>KV/X{Take all the wordsXpthe input XPXVseparate them.KV/X{</X7>KVqX?XP[] XdWords(XP s)KV{qKVV//KVVX{Xd on all non-word cX/s.KVVX9Returns an XU of all the words.KVV//KVVqXK qXY.Xdq(s, q@Q\W+Qq);qKVVX{@V special verbatim XP syntaxKVVX{\W+Vone or more non-word cX/s togetherKVq}K}KKqKKThatKisKaKcuteKcatKmanqContents of input file: TextFile1.txtqKKDog,Cat,Mouse,Fish,Cow,Horse,HyenaKXAmer,Wizard,CEO,Rancher,Clerk,FarmerKKqKKXDXQ;KXDXQ.IOXbXJXAK{KVX%V{KVVXiiXy0;KVVX@ (XP lineXpFile.XuAllLines(qQTextFile1.txtQq))KVV{KVVVXP[] partsXyline.qXdq(',');KVVVX@ (XP partXpparts)KVVV{KVVVVX'qQ{0}:{1}Qq,KVVVVVi,KVVVVVpart);KVVV}KVVVi++;q X{Xsdemonstration.KVVq}KV}K}KKqKK0:DogK0:CatK0:MouseK0:FishK0:CowK0:HorseK0:HyenaK1:XAmerK1:WizardK1:CEOK1:RancherK1:ClerkK1:FarmerqKKXDXQXbXJXAK{KVX%V{qKVVX{The directory from Windows.KVVqconst XP dirXyq@QC:\Users\Sam\Documents\Perls\XrQq;qKVVX{Xd on directory separator.KVVqXP[] partsXydir.qXdq('\\');KVVX@ (XP partXpparts)KVV{KVVVX'part);KVV}KV}K}KKqKKC:KUsersKSamKDocumentsKPerlsKXrqXOs usedXptest: C#qKKq//KX{Build long XP.K//Kq_testXyXP.Empty;KXw(XiiXy0; i < 120; i++)K{KV_test += qQ01234567\r\nQq;K}qK//KX{Build short XP.K//Kq_testXyXP.Empty;KXw(XiiXy0; i < 10; i++)K{KV_test += qQab\r\nQq;K}KKqMXgs tested: 100000 iterationsqKKX?XcTest1()K{KVXP[] arrXyXY.Xd(_test, Q\r\nQ, XYOptions.Compiled);K}KKX?XcTest2()K{KVXP[] arrXy_test.Xd(Xqchar[] { '\r', '\n' },KVVVVVVV XOXdOptions.RemoveEmptyEntries);K}KKX?XcTest3()K{KVXP[] arrXy_test.Xd(XqXP[] { Q\r\nQ },KVVVVVVV XOXdOptions.None);K}qBenchmark of Xd on long XPsqKK[1] XY.Xd:V3470 msK[2] char[] Xd: q1255 msq [fastest]K[3] XP[] Xd: 1449 msKKqBenchmark of Xd on short XPsqKK[1] XY.Xd:V 434 msK[2] char[] Xd: q 63 msq [fastest]K[3] XP[] Xd: 83 msqSlow version, beX|e: C#qKKq//KX{Xd on multiple cX/s XDXqchar[] inline.K//KqXP tXyqQXPXjsplit, okQqXbXw(XiiXy0; i < 10000000; i++)K{KVXP[] sXyt.Xd(Xqchar[] { ' ', ',' });K}KKqFast version, after: C#qKKq//KX{Xd on multiple cX/s XDXqchar[] already cX_d.K//KqXP tXyqQXPXjsplit, okQq;Kchar[] cXyXqchar[]{ ' ', ',' };q X{<-- Cache thisKKqXw(XiiXy0; i < 10000000; i++)K{KVXP[] sXyt.Xd(c);K}qKKXDXQXbXJXAK{KVX%V{qKVVX{Input XP contain separators.KVVqXP Xh1XyqQman,woman,child,,,birdQq;KVVchar[] delimiter1XyXqchar[] { ',' };q X{<-- Xd on theseKKVVX9Use XOXdOptions.None.KVVqXP[] XU1XyXh1.qXdq(delimiter1,KVVVXOXdOptions.None)XbVVX@ (XP entryXpXU1)KVV{KVVVX'entry);KVV}qKKVVX9Use XOXdOptions.RemoveEmptyEntries.KVVqXP[] XU2XyXh1.qXdq(delimiter1,KVVVXOXdOptions.RemoveEmptyEntries)XbVVX');KVVX@ (XP entryXpXU2)KVV{KVVVX'entry);KVV}KV}K}KKqKKqman KwomanKchildKV KV Kbird KKman KwomanKchildKbird q

'0:==4:bsplits on spacessplits on lines with Regexsplits on multiple charactersseparates on non-word patternsplits lines in filesplits Windows directoriesStringSplitOptions