Remove HTML. Often we encounter Strings that contains HTML markup. It is possible to remove this markup with a custom VB.NET Function.
A Function. We develop a custom Function based on the Regex type. It uses a regular expression to strip HTML markup tags—this works on many source strings.
An example. To begin, this program introduces the StripTags Function, which performs the HTML removal. This calls the Regex.Replace function.
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
' Input.
Dim html As String = "<p>There was a <b>.NET</b> programmer " +
"and he stripped the <i>HTML</i> tags.</p>"' Call Function.
Dim res As String = StripTags(html)
' Write.
Console.WriteLine(res)
End Sub
''' <summary>
''' Strip HTML tags.
''' </summary>
Function StripTags(ByVal html As String) As String
' Remove HTML tags.
Return Regex.Replace(html, "<.*?>", "")
End Function
End ModuleThere was a .NET programmer and he stripped the HTML tags.
A warning. If you have HTML markup that is malformed in any way, or has comments, this method will not work. You may wish to first validate the markup.
Tip You can validate HTML markup using a simple parser that matches tag characters.
Summary. The easiest way to strip HTML tags is to use the Regex type. Other methods that scan the String and use Char arrays are more efficient, but will also be more complicated.
Dot Net Perls is a collection of pages with code examples, which are updated to stay current. Programming is an art, and it can be learned from examples.
Donate to this site to help offset the costs of running the server. Sites like this will cease to exist if there is no financial support for them.
Sam Allen is passionate about computer languages, and he maintains 100% of the material available on this website. He hopes it makes the world a nicer place.
This page was last updated on Mar 20, 2023 (edit).