DQDQ$bJA%YyqYym'XDQDQ$bJA%{PPy{YyYY{m{hVPyX'PP/PDQDQ$bJA%Phy{kyYhm'X{kym'XDQDQ$bJA%{PPy{fyyYDQDQ$bJA%{PPy{?g'YJYYyqY{7{K{7?BPPymKXKDQDQ$bJA%9PPy9yY9ah'XDQDQ$bJA%yYm'Xy}X'^y}^'Ty}TXy^yTyDQDQ$bJA{7{PDY?g{7?PhKYh%{Pg'''{'{DQDQ$bJA%Phy9f/mYhY'aAbbEErCfAX-~~~| 7475V~~~B 649466}4FW~ 869VB 6657ZC~B 6465549V~~B 6496V~~~ 54777V~B~C 946846VYZYYZ~BCBWBZZZZY-~B~B 4647V~B~B~BXX~X

Regex.` We live in a universe of great complexity. An acorn falls to the ground. A tree grows in its place. From small things big effects come.`Now consider` a regular expression. This is a tiny program. Much like an acorn it contains a processing instruction. It processes text—it replaces and matches text.`Match.` This program introduces the Regex class. We use its constructor and the Match method, and then handle the returned Match object. `Namespace: `All these types are found in the System.Text.RegularExpressions namespace.`Pattern: `The Regex uses a pattern that indicates one or more digits. The characters "55" match this pattern.`Success: `The returned Match object has a bool property called Success. If it equals true, we found a match.`Static method.` Here we match parts of a string (a file name in a directory path). We only accept ranges of characters and some punctuation. On Success, we access the group. `Static: `We use the Regex.Match static method. It is also possible to call Match upon a Regex object.`Success: `We test the result of Match with the Success property. When true, a Match occurred and we can access its Value or Groups.`Groups: `This collection is indexed at 1, not zero—the first group is found at index 1. This is important to remember.`Groups `regex-groups`NextMatch.` More than one match may be found. We can call the NextMatch method to search for a match that comes after the current one in the text. NextMatch can be used in a loop. `We match all the digits in the input string (4 and 5). Two matches occur, so we use NextMatch to get the second one.`Return: `NextMatch returns another Match object—it does not modify the current one. We assign a variable to it.`Preprocess.` Sometimes we can preprocess strings before using Match() on them. This can be faster and clearer. Experiment. I found using ToLower to normalize chars was a good choice. `ToLower `tolower`Static.` Often a Regex instance object is faster than the static Regex.Match. For performance, we should usually use an instance object. It can be shared throughout an entire project. `Static Regex `static-regex`Sometimes: `We only need to call Match once in a program's execution. A Regex object does not help here.`Class: `Here a static class stores an instance Regex that can be used project-wide. We initialize it inline.`Static Class `static`Numbers.` A common requirement is extracting a number from a string. We can do this with Regex.Match. To get further numbers, consider Matches() or NextMatch. `Digits: `We extract a group of digit characters and access the Value string representation of that number.`Parse: `To parse the number, use int.Parse or int.TryParse on the Value here. This will convert it to an int.`Parse `parse`Value, length, index.` A Match object, returned by Regex.Match has a Value, Length and Index. These describe the matched text (a substring of the input). `Value: `This is the matched text, represented as a separate string. This is a substring of the original input.`Length: `This is the length of the Value string. Here, the Length of "Axxxxy" is 6.`Index: `The index where the matched text begins within the input string. The character "A" starts at index 4 here.`IsMatch.` This method tests for a matching pattern. It does not capture groups from this pattern. It just sees if the pattern exists in a valid form in the input string. `Bool: `IsMatch returns a bool value. Both overloads receive an input string that is searched for matches.`Bool Method `bool-return`Internals: `When we use the static Regex.IsMatch method, a new Regex is created. This is done in the same way as any instance Regex.`This instance is discarded at the end of the method. It will be cleaned up by the garbage collector.`Matches.` Sometimes one match is not enough. Here we use Matches instead of Match: it returns multiple Match objects at once. These are returned in a MatchCollection. `Matches `regex-matches`Matches: Quote `regex-matches-quote`Replace.` Sometimes we need to replace a pattern of text with some other text. Regex.Replace helps. We can replace patterns with a string, or with a value determined by a MatchEvaluator. `Replace `regex-replace`Replace: End `regex-replace-end`Replace: Numbers `regex-replace-numbers`Replace: Spaces `regex-replace-spaces`Replace: Trim `regex-trim`Split.` Do you need to extract substrings that contain only certain characters (certain digits, letters)? Split() returns a string array that will contain the matching substrings. `Split `regex-split`Numbers: `We can handle certain character types, such as numbers, with the Split method. This is powerful. It handles many variations.`Split: Numbers `regex-split-numbers`Caution: `The Split method in Regex is more powerful than the one on the string type. But it may be slower in common cases.`String Split `split`Escape.` This method can change a user input to a valid Regex pattern. It assumes no metacharacters were intended. The input string should be only literal characters. `With Escape, we don't get out of jail free, but we do change the representation of certain characters in a string.`Escape, Unescape `regex-escape`Star.` Also known as a Kleene closure in language theory. It is important to know the difference between the star and the plus. A star means zero or more. `Star `star`Word count.` With Regex we can count words in strings. We compare this method with Microsoft Word's implementation. We come close to Word's algorithm. `Word Count `word-count`Files.` We often need to process text files. The Regex type, and its methods, are used for this. But we need to combine a file input type, like StreamReader, with the Regex code. `Regex: Files `regex-file`HTML.` Regex can be used to process or extract parts of HTML strings. There are problems with this approach. But it works in many situations. `HTML: Title `title-html`HTML: Paragraphs `paragraph-html`HTML: Remove HTML Tags `remove-html-tags`RegexOptions.` With the Regex type, the RegexOptions enum is used to modify method behavior. Often I find the IgnoreCase value helpful. `IgnoreCase: `Lowercase and uppercase letters are distinct in the Regex text language. IgnoreCase changes this.`IgnoreCase `regexoptions-ignorecase`Multiline: `We can change how the Regex type acts upon newlines with the RegexOptions enum. This is often useful.`Multiline `regexoptions-multiline`Is Regex fast?` This question is a topic of great worldwide concern. Sadly Regex often results in slower code than imperative loops. But we can optimize Regex usage. `1. Compile. `Using the RegexOptions.Compiled argument to a Regex instance will make it execute faster. This however has a startup penalty.`RegexOptions.Compiled `regexoptions-compiled`2. Replace with loop. `Some Regex method calls can be replaced with a loop. The loop is much faster.`Regex vs. Loop `regex-versus-loop`3. Use static fields. `You can cache a Regex instance as a static field—an example is provided here.`Regex Performance `regex-performance`Research.` A regular expression can describe any "regular" language. These languages are ones where complexity is finite: there is a limited number of possibilities.`A warning.` Some languages, like HTML, are not regular languages. This means you cannot fully parse them with traditional regular expressions.`Automaton.` A regular expression is based on finite state machines. These automata encode states and possible transitions to new states.`Operators.` Regular expressions use compiler theory. With a compiler, we transform regular languages (like Regex) into tiny programs that mess with text. `Quote: `These expressions are commonly used to describe patterns. Regular expressions are built from single characters, using union, concatenation, and the Kleene closure, or any-number-of, operator (Compilers: Principles, Techniques and Tools).`A summary.` Regular expressions are a concise way to process text data. This comes at a cost. For performance, we can rewrite Regex calls with low-level char methods.`Representations.` Regex is a high-level representation of the same logic expressed with loops and char arrays. This logic is represented in a simple, clear way.

78? 88; 88.8888 { 787{ 77?8? regex888(?@"\d+"?); 77Match match8regex.?Match?(?"Dot 55 Perls"?); 778match.Success) 77{ 7778match.8); 77} 7} } ? 55? 88; 88.8888 { 787{? 778First we see the input 8. 77?8 input8"/content/?alternate-1?.aspx";? 778Here we call 8.Match. 77?Match match8?8.Match?(input, @"content/(?[A-Za-z0-9\-]+)?\.aspx$", 7778Options.IgnoreCase);? 778Here we check the Match instance. 77?8match.Success) 77{? 7778Finally, we get the Group 88display it. 777?8 key8match.Groups[1].8; 7778key); 77} 7} } ? ?alternate-1? ?Pattern details? @"777 This starts a verbatim 8 literal. content/77The group must follow this 8. [A-Za-z0-9\-]+ One or more alphanumeric c8s. (...)77 A separate group. \.aspx77 This must come after the group. $777 Matches the end of the 8.? 88; 88.8888 { 787{ 778 88"?4? AND ?5?";? 7788first match. 77?Match match88.Match(8, @"\d"); 778match.Success) 77{ 7778match.8); 77}? 7788second match. 77?match8match.?NextMatch?(); 778match.Success) 77{ 7778match.8); 77} 7} } ? 4 5? 88; 88.8888 { 787{? 778This is the input 8. 77?8 input8"/content/alternate-1.aspx";? 778Here we lower8our input first. 77?input8input.?ToLower?(); 77Match match8?8.Match?(input, @"content/([A-Za-z0-9\-]+)\.aspx$"); 7} }? 88; 88.8888 { 787{? 778The input 8 again. 77?8 input8?"/content/alternate-1.aspx"?;? 778This calls the 8m8 specified. 77?88Util.MatchKey(input)); 7} } ?static? 88Util { 7?static? 8 _regex888(?@"/content/([a-z0-9\-]+)\.aspx$"?);? 7/8<8> 7/8This 8s the key that is matched within the input. 7/8</8> 7?888 MatchKey(8 input) 7{ 77Match match8_regex.?Match?(input.ToLower()); 778match.Success) 77{ 7778 match.Groups[1].8; 77} 77else 77{ 7778 null; 77} 7} } ? alternate-1? 88; 88.8888 { 787{? 778Input 8. 77?8 input8"Dot Net ?100? Perls";? 778One or more digits. 77?Match m8?8.Match?(input, @"\d+");? 7788 8. 77?8m.8); 7} } ? 100? 88; 88.8888 { 787{ 77Match m8?8.Match?("123 ?Axxxxy?", @"A.*y"); 778m.Success) 77{ 7778"8 8"8m.?8?); 7778"L88"8m.?L8?); 7778"8 8"8m.?8?); 77} 7} } ? 8 8Axxxxy L886 8 84? 88; 88.8888 {? 7/8<8> 7/8Test 8 88.IsMatch 8m8. 7/8</8> 7?8bool IsValid(8 8) 7{ 778 ?8.IsMatch?(8, ?@"^[a-zA-Z0-9]*$"?); 7} 787{? 778Test the 8s with the IsValid m8. 77?8IsValid(?"dotnetperls0123"?)); 778IsValid(?"DotNetPerls"?)); 778IsValid(?":-)"?));? 7788IsValid(null)); 8Throws an exception 7?} } ? True True False? 88; 88.8888 { 787{ 77const 8 88?"TEST"?;? 778This ignores the 8of the "TE" c8s. 77?88.IsMatch(8, ?"te.."?, ?8Options.IgnoreCase?)) 77{ 7778true); 77} 7} } ? True?

)\[Y^\/:dgMatch, RegexRegex.MatchNextMatchToLower, Matchstatic Regexmatches numbersshows value, length, indexRegex.IsMatch methodRegexOptions.IgnoreCase