The title tag in an HTML file usually provides a good label for the contents of the page. And with VB.NET, we can extract the title for further processing.
With the Regex.Match
function, we can extract the text content within the title element. This approach is not always successful, but it often works.
To begin, it is important to include the RegularExpressions
namespace with an Imports statement. Otherwise the program will not compile correctly.
String
with some HTML contents. In a real-world program, we might read in this data from a file with File.ReadAllText
.GetTitle()
and pass it the html string
. The Regex
uses Kleene closures to process the text inside matching title tags.Imports System.Text.RegularExpressions Module Module1 Sub Main() Dim html as String = "<html><title>Example.</title><body><p>...</p></body></html>" Console.WriteLine(GetTitle(html)) End Sub Function GetTitle(value as String) ' Use regular expression to match title tags. Dim match as Match = Regex.Match(value, "<title>\s*(.+?)\s*</title>") If match.Success Return match.Groups(1).Value Else Return "" End If End Function End ModuleExample.
This code will may work correctly if the html page has commented-out HTML on it. And it won't match uppercase TITLE tags. For case-insensitive code, consider the RegexOptions
enum
.
It is possible in many cases to match the title element from an HTML page (or a String
containing HTML). And this is sufficient for further processing of the data in VB.NET.