
A file can be parsed with Regex. The Regex can process each line to find all matching parts. This is useful for log files or output from other programs. Here is a tutorial on processing a file with regular expressions in the C# programming language.
First, to use a regular expression on a file you must first read in the file into a string in the C# language. Here's a console program that opens a StreamReader on the file and reads in each line. Note how the ReadLine() method will return each line separately, or null if there are no more data.
Program that uses StreamReader [C#]
using System.IO;
class Program
{
static void Main()
{
// 1.
// Open file for reading.
using (StreamReader r = new StreamReader("ex081016.log"))
{
// 2.
// Read each line until EOF.
string line;
while ((line = r.ReadLine()) != null)
{
// 3.
// Do stuff with line.
}
}
}
}Here we create the regular expression object. My research shows that using a single regular expression and reusing can be around 30% faster than the Regex.Match static method. This makes it worthwhile to use a single Regex when you need to apply it to thousands of lines.
Program that declares regular expression [C#]
using System.IO;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
// A.
Regex g = new Regex(@"\s/Content/([a-zA-Z0-9\-]+?)\.aspx");
// "\s/Content/" : space and then Content directory
// "([a-zA-Z0-9\-]+?) : group of alphanumeric characters and hyphen
// ? : don't be greedy, match lazily.
// \.aspx : file extension required for match
// B.
using (StreamReader r = new StreamReader("ex081016.log"))
{
string line;
while ((line = r.ReadLine()) != null)
{
}
}
}
}Explanation. In part A, it creates a Regex. The Regex here is complicated but the comment tries to explain its parts. In part B, it has the same IO code. The file handling code is the same here as before.

Here we put the regular expression logic into the StreamReader code to parse an entire file. We will use the Regex we created and use it to match each line. We only look for one Match here, but you can use Matches to do more than one.
Program that matches lines [C#]
using System;
using System.IO;
using System.Text.RegularExpressions;
class Program
{
static void Main()
{
Regex g = new Regex(@"\s/Content/([a-zA-Z0-9\-]+?)\.aspx");
using (StreamReader r = new StreamReader("ex081016.log"))
{
string line;
while ((line = r.ReadLine()) != null)
{
// X.
// Try to match each line against the Regex.
Match m = g.Match(line);
if (m.Success)
{
// Y.
// Write original ine and the value.
string v = m.Groups[1].Value;
Console.WriteLine(line);
Console.WriteLine("\t" + v);
}
}
}
}
}Modifications to the program. Parts X and Y above were added. X simply applies the Regex to each line and captures the groups. Finally, Y gets the value from the Groups. The Groups collection is indexed starting at 1. Never access Groups[0], which can result in lots of grief as your algorithm will not work.
Here we look at some example output and the matched part is highlighted. The first part of the text is a single line. What the regular expression did is that it captured the text between "Content/" and ".aspx", which is what it was supposed to do. You will want to change the text and other parts of the Regex.
2008-10-16 23:59:50 W3SVC2915713 GET /Content/Trim-String-Regex.aspx - 80 66.249 .70.241 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) - 200 3753 309

There are more usages of this kind of code in programs. Some examples are: matching lines in files such as logs, trace files, scientific calculations, CSV files, or really any text file. Generally, processing each line separately will be faster than all at once because less memory must be accessed and fewer characters must be checked.

We looked at how you can use a regular expression in the C# language on every line in a text file. I showed an accurate and simple way of matching every line in a text file. The code processes each line in the text file, looking for matches. Entire languages like Perl tackle this problem, but C# is equally effective. We saw how you can combine the StreamReader class with the Regex class in the base class library to parse large text files.
Regex Type