VB.NET Remove HTML TagsRemove HTML markup from Strings using a Regex-based method that does not always work.
Remove HTML. A String contains HTML markup. It is possible to remove this markup with a VB.NET Function. We develop a custom Function based on the Regex type. It uses a regular expression to strip HTML markup tags.
To begin, this program imports the System.Text.RegularExpressions namespace. Next it introduces the StripTags Function, which performs the HTML removal. This calls the Regex.Replace function.
StripTags: Here all text matching the pattern < followed by multiple characters and ending with > is replaced with an empty string.Regex.Replace
Main: We declare a String literal that contains HTML markup. Next, the StripTags function is invoked with that String as the argument.
Finally: We demonstrate that the resulting string has no HTML markup remaining by printing it to the Console.Console
VB.NET program that removes HTML markup from String
Dim html As String = "<p>There was a <b>.NET</b> programmer " +
"and he stripped the <i>HTML</i> tags.</p>"
' Call Function.
Dim tagless As String = StripTags(html)
''' Strip HTML tags.
Function StripTags(ByVal html As String) As String
' Remove HTML tags.
Return Regex.Replace(html, "<.*?>", "")
There was a .NET programmer and he stripped the HTML tags.
If you have HTML markup that is malformed in any way, or has comments, this method will cause you grief. You may wish to first validate the markup. You can validate HTML markup using a simple parser that matches < and > tags.
Alternatively: You can build a more advanced parser that handles the incorrect markup you encounter.
Summary. The easiest way to strip HTML tags from your String data is to use the Regex type. Other methods that scan the String and use Char arrays are more efficient, but will also make your program much more complicated.
© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to email@example.com.