aAbEErCfAX-~~~| 7475V~~~B 649466}4FW~ 869VB 6657ZC~B 6465549V~~B 6496V~~~ 54777V~B~C 946846VYZYYZ~BCBWBZZZZY-~B~B 4647V~B~B~BXX~X

Regex.` We live in a universe of great complexity. An acorn falls to the ground. A tree grows in its place. From small things big effects come.`Now consider` a regular expression. This is a tiny program. Much like an acorn it contains a processing instruction. It processes text—it replaces and matches text.`Match.` This program introduces the Regex class. We use its constructor and the Match method, and then handle the returned Match object. `Namespace: `All these types are found in the System.Text.RegularExpressions namespace.`Pattern: `The Regex uses a pattern that indicates one or more digits. The characters "55" match this pattern.`Success: `The returned Match object has a bool property called Success. If it equals true, we found a match.`Static method.` Here we match parts of a string (a file name in a directory path). We only accept ranges of characters and some punctuation. On Success, we access the group. `Static: `We use the Regex.Match static method. It is also possible to call Match upon a Regex object.`Success: `We test the result of Match with the Success property. When true, a Match occurred and we can access its Value or Groups.`Groups: `This collection is indexed at 1, not zero—the first group is found at index 1. This is important to remember.`Groups `regex-groups`NextMatch.` More than one match may be found. We can call the NextMatch method to search for a match that comes after the current one in the text. NextMatch can be used in a loop. `We match all the digits in the input string (4 and 5). Two matches occur, so we use NextMatch to get the second one.`Return: `NextMatch returns another Match object—it does not modify the current one. We assign a variable to it.`Preprocess.` Sometimes we can preprocess strings before using Match() on them. This can be faster and clearer. Experiment. I found using ToLower to normalize chars was a good choice. `ToLower `tolower`Static.` Often a Regex instance object is faster than the static Regex.Match. For performance, we should usually use an instance object. It can be shared throughout an entire project. `Static Regex `static-regex`Sometimes: `We only need to call Match once in a program's execution. A Regex object does not help here.`Class: `Here a static class stores an instance Regex that can be used project-wide. We initialize it inline.`Static Class `static`Numbers.` A common requirement is extracting a number from a string. We can do this with Regex.Match. To get further numbers, consider Matches() or NextMatch. `Digits: `We extract a group of digit characters and access the Value string representation of that number.`Parse: `To parse the number, use int.Parse or int.TryParse on the Value here. This will convert it to an int.`Parse `parse`Value, length, index.` A Match object, returned by Regex.Match has a Value, Length and Index. These describe the matched text (a substring of the input). `Value: `This is the matched text, represented as a separate string. This is a substring of the original input.`Length: `This is the length of the Value string. Here, the Length of "Axxxxy" is 6.`Index: `The index where the matched text begins within the input string. The character "A" starts at index 4 here.`IsMatch.` This method tests for a matching pattern. It does not capture groups from this pattern. It just sees if the pattern exists in a valid form in the input string. `Bool: `IsMatch returns a bool value. Both overloads receive an input string that is searched for matches.`Bool Method `bool-return`Internals: `When we use the static Regex.IsMatch method, a new Regex is created. This is done in the same way as any instance Regex.`This instance is discarded at the end of the method. It will be cleaned up by the garbage collector.`Matches.` Sometimes one match is not enough. Here we use Matches instead of Match: it returns multiple Match objects at once. These are returned in a MatchCollection. `Matches `regex-matches`Matches: Quote `regex-matches-quote`Replace.` Sometimes we need to replace a pattern of text with some other text. Regex.Replace helps. We can replace patterns with a string, or with a value determined by a MatchEvaluator. `Replace `regex-replace`Replace: End `regex-replace-end`Replace: Numbers `regex-replace-numbers`Replace: Spaces `regex-replace-spaces`Replace: Trim `regex-trim`Split.` Do you need to extract substrings that contain only certain characters (certain digits, letters)? Split() returns a string array that will contain the matching substrings. `Split `regex-split`Numbers: `We can handle certain character types, such as numbers, with the Split method. This is powerful. It handles many variations.`Split: Numbers `regex-split-numbers`Caution: `The Split method in Regex is more powerful than the one on the string type. But it may be slower in common cases.`String Split `split`Escape.` This method can change a user input to a valid Regex pattern. It assumes no metacharacters were intended. The input string should be only literal characters. `With Escape, we don't get out of jail free, but we do change the representation of certain characters in a string.`Escape, Unescape `regex-escape`Star.` Also known as a Kleene closure in language theory. It is important to know the difference between the star and the plus. A star means zero or more. `Star `star`Word count.` With Regex we can count words in strings. We compare this method with Microsoft Word's implementation. We come close to Word's algorithm. `Word Count `word-count`Files.` We often need to process text files. The Regex type, and its methods, are used for this. But we need to combine a file input type, like StreamReader, with the Regex code. `Regex: Files `regex-file`HTML.` Regex can be used to process or extract parts of HTML strings. There are problems with this approach. But it works in many situations. `HTML: Title `title-html`HTML: Paragraphs `paragraph-html`HTML: Remove HTML Tags `remove-html-tags`RegexOptions.` With the Regex type, the RegexOptions enum is used to modify method behavior. Often I find the IgnoreCase value helpful. `IgnoreCase: `Lowercase and uppercase letters are distinct in the Regex text language. IgnoreCase changes this.`IgnoreCase `regexoptions-ignorecase`Multiline: `We can change how the Regex type acts upon newlines with the RegexOptions enum. This is often useful.`Multiline `regexoptions-multiline`Is Regex fast?` This question is a topic of great worldwide concern. Sadly Regex often results in slower code than imperative loops. But we can optimize Regex usage. `1. Compile. `Using the RegexOptions.Compiled argument to a Regex instance will make it execute faster. This however has a startup penalty.`RegexOptions.Compiled `regexoptions-compiled`2. Replace with loop. `Some Regex method calls can be replaced with a loop. The loop is much faster.`Regex vs. Loop `regex-versus-loop`3. Use static fields. `You can cache a Regex instance as a static field—an example is provided here.`Regex Performance `regex-performance`Research.` A regular expression can describe any "regular" language. These languages are ones where complexity is finite: there is a limited number of possibilities.`A warning.` Some languages, like HTML, are not regular languages. This means you cannot fully parse them with traditional regular expressions.`Automaton.` A regular expression is based on finite state machines. These automata encode states and possible transitions to new states.`Operators.` Regular expressions use compiler theory. With a compiler, we transform regular languages (like Regex) into tiny programs that mess with text. `Quote: `These expressions are commonly used to describe patterns. Regular expressions are built from single characters, using union, concatenation, and the Kleene closure, or any-number-of, operator (Compilers: Principles, Techniques and Tools).`A summary.` Regular expressions are a concise way to process text data. This comes at a cost. For performance, we can rewrite Regex calls with low-level char methods.`Representations.` Regex is a high-level representation of the same logic expressed with loops and char arrays. This logic is represented in a simple, clear way.

YBXQqXXBDBQ;XBDBQ.B$BbBJBAX{XYB%Y{XYYqBYq regexByBqBY(q@Q\d+Qq);XYYMatch matchByregex.qMatchq(qQDot 55 PerlsQq);XYYBmmatch.Success)XYY{XYYYB'match.BX);XYY}XY}X}XXqXX55qXXBDBQ;XBDBQ.B$BbBJBAX{XYB%Y{qXYYB{First we see the input BP.XYYqBP inputByQ/content/qalternate-1q.aspxQ;qXXYYB{Here we call BY.Match.XYYqMatch matchByqBY.Matchq(input, @Qcontent/(q[A-Za-z0-9\-]+)q\.aspx$Q,XYYYBYOptions.IgnoreCase);qXXYYB{Here we check the Match instance.XYYqBmmatch.Success)XYY{qXYYYB{Finally, we get the Group BhBVdisplay it.XYYYqBP keyBymatch.Groups[1].BX;XYYYB'key);XYY}XY}X}XXqXXqalternate-1qXXqPattern detailsqXX@QYYY This starts a verbatim BP literal.Xcontent/YYThe group must follow this BP.X[A-Za-z0-9\-]+ One or more alphanumeric cB/s.X(...)YY A separate group.X\.aspxYY This must come after the group.X$YYY Matches the end of the BP.qXXBDBQ;XBDBQ.B$BbBJBAX{XYB%Y{XYYBP BhByQq4q AND q5qQ;qXXYYB{Bkfirst match.XYYqMatch matchByBY.Match(Bh, @Q\dQ);XYYBmmatch.Success)XYY{XYYYB'match.BX);XYY}qXXYYB{Bksecond match.XYYqmatchBymatch.qNextMatchq();XYYBmmatch.Success)XYY{XYYYB'match.BX);XYY}XY}X}XXqXX4X5qXXBDBQ;XBDBQ.B$BbBJBAX{XYB%Y{qXYYB{This is the input BP.XYYqBP inputByQ/content/alternate-1.aspxQ;qXXYYB{Here we lowerBfour input first.XYYqinputByinput.qToLowerq();XYYMatch matchByqBY.Matchq(input, @Qcontent/([A-Za-z0-9\-]+)\.aspx$Q);XY}X}qXXBDBQ;XBDBQ.B$BbBJBAX{XYB%Y{qXYYB{The input BP again.XYYqBP inputByqQ/content/alternate-1.aspxQq;qXXYYB{This calls the B?mBg specified.XYYqB'BYUtil.MatchKey(input));XY}X}XXqstaticq BJBYUtilX{XYqstaticq BY _regexByBqBY(q@Q/content/([a-z0-9\-]+)\.aspx$Qq);qXY/B{<B7>XY/B{This BKs the key that is matched within the input.XY/B{</B7>XYqB?BBBP MatchKey(BP input)XY{XYYMatch matchBy_regex.qMatchq(input.ToLower());XYYBmmatch.Success)XYY{XYYYBK match.Groups[1].BX;XYY}XYYelseXYY{XYYYBK null;XYY}XY}X}XXqXXalternate-1qXXBDBQ;XBDBQ.B$BbBJBAX{XYB%Y{qXYYB9Input BP.XYYqBP inputByQDot Net q100q PerlsQ;qXXYYB9One or more digits.XYYqMatch mByqBY.Matchq(input, @Q\d+Q);qXXYYB9Ba Bh.XYYqB'm.BX);XY}X}XXqXX100qXXBDBQ;XBDBQ.B$BbBJBAX{XYB%Y{XYYMatch mByqBY.Matchq(Q123 qAxxxxyqQ, @QA.*yQ);XYYBmm.Success)XYY{XYYYB'QBX ByQB}m.qBXq);XYYYB'QLB^ByQB}m.qLB^q);XYYYB'QBT ByQB}m.qBTq);XYY}XY}X}XXqXXBX ByAxxxxyXLB^By6XBT By4qXXBDBQ;XBDBQ.B$BbBJBAX{qXY/B{<B7>XY/B{Test BP BDBY.IsMatch B?mBg.XY/B{</B7>XYqB?bool IsValid(BP Bh)XY{XYYBK qBY.IsMatchq(Bh, q@Q^[a-zA-Z0-9]*$Qq);XY}XXYB%Y{qXYYB{Test the BPs with the IsValid mBg.XYYqB'IsValid(qQdotnetperls0123Qq));XYYB'IsValid(qQDotNetPerlsQq));XYYB'IsValid(qQ:-)Qq));qXYYB{B'IsValid(null)); B{Throws an exceptionXYq}X}XXqXXTrueXTrueXFalseqXXBDBQ;XBDBQ.B$BbBJBAX{XYB%Y{XYYconst BP BhByqQTESTQq;qXYYB9This ignores the Bfof the QTEQ cB/s.XYYqBmBY.IsMatch(Bh, qQte..Qq, qBYOptions.IgnoreCaseq))XYY{XYYYB'true);XYY}XY}X}XXqXXTrueq

)\[Y^\/:dgMatch, RegexRegex.MatchNextMatchToLower, Matchstatic Regexmatches numbersshows value, length, indexRegex.IsMatch methodRegexOptions.IgnoreCase