Get Title From HTMLGet the HTML title from strings with Regex. Invoke the Regex.Match method.
This page was last reviewed on Jan 25, 2022.
Title from HTML. HTML documents have title elements. The data in title elements is important. It is used for search engine optimization and RSS feeds.
C# method info. This simple method extracts the TITLE elements from HTML documents. It uses the Regex.Match method, and looks for specific strings in the HTML.
Paragraph HTML
Example. We can extract the contents of the TITLE element from HTML. This is important for making sure your HTML is correct. After the code, we see the Regex parts in detail and more factors.
Detail This console application first gets the first TITLE element from the HTML file.
Then The program prints the title to the console. The application must have the specified HTML file present in the current directory.
Detail This looks for a start tag and an end tag. It ignores whitespace between the inner parts of the tags and the string.
using System; using System.IO; using System.Text.RegularExpressions; class Program { static void Main() { // Read in an HTML file. string html = File.ReadAllText("Problem.html"); // Get the title of the HTML. Console.WriteLine(GetTitle(html)); } /// <summary> /// Get title from an HTML string. /// </summary> static string GetTitle(string file) { Match m = Regex.Match(file, @"<title>\s*(.+?)\s*</title>"); if (m.Success) { return m.Groups[1].Value; } else { return ""; } } }
Title of the Page
@ Uses special string syntax. \s* Matches 0 or more spaces. (.+?) Matches text but isn't greedy. Stops as soon as it can. \s* Matches 0 or more spaces. Match C# regular expression object. Groups[1] First group found in input. Starts at 1. Value String value of Group.
Errors. This code is not flexible enough for some HTML documents. It won't work for complicated HTML, such as HTML that heavily uses attributes.
Also The logic assumes the HTML is lowercase, although this could be easily changed.
Detail You can use regular expressions like these for reading important elements from your HTML.
A summary. We can capture the contents of the TITLE and paragraph elements from HTML documents using the C# language. Every webmaster should know that the TITLE is important.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Jan 25, 2022 (edit).
© 2007-2023 Sam Allen.