Title, HTML

The title tag in an HTML file usually provides a good label for the contents of the page. And with VB.NET, we can extract the title for further processing.

With the Regex.Match function, we can extract the text content within the title element. This approach is not always successful, but it often works.

Example

To begin, it is important to include the RegularExpressions namespace with an Imports statement. Otherwise the program will not compile correctly.

Start We specify a String with some HTML contents. In a real-world program, we might read in this data from a file with File.ReadAllText.

Next We call GetTitle() and pass it the html string. The Regex uses Kleene closures to process the text inside matching title tags.

Note The star means "zero or more," and the plus means "one or more." The "\s" indicates whitespace characters.

Imports System.Text.RegularExpressions

Module Module1

    Sub Main()
        Dim html as String = "<html><title>Example.</title><body><p>...</p></body></html>"
        Console.WriteLine(GetTitle(html))
    End Sub

    Function GetTitle(value as String)
        ' Use regular expression to match title tags.
        Dim match as Match = Regex.Match(value, "<title>\s*(.+?)\s*</title>")
        If match.Success
            Return match.Groups(1).Value
        Else
            Return ""
        End If
    End Function

End Module
Example.

Some notes

This code will may work correctly if the html page has commented-out HTML on it. And it won't match uppercase TITLE tags. For case-insensitive code, consider the RegexOptions enum.

It is possible in many cases to match the title element from an HTML page (or a String containing HTML). And this is sufficient for further processing of the data in VB.NET.