In 1965 Vladmir Levenshtein introduced a distance algorithm. This tells us the number of changes (edits) one string
must go through to become another string
.
In VB.NET we compute Levenshtein distance with a Function. This function returns the number of character edits that must occur to get from String
A to String
B.
The internals of the Levenshtein distance Function are complex. Internally it allocates a 2D array. This 2D array has dimensions equal to the lengths of the 2 argument strings.
For
-loop is executed. In the inner block of the For
-loops, the cost of the change of two elements is computed.Module Module1 Sub Main() Console.WriteLine(LevenshteinDistance("aunt", "ant")) Console.WriteLine(LevenshteinDistance("Sam", "Samantha")) Console.WriteLine(LevenshteinDistance("flomax", "volmax")) End Sub ''' <summary> ''' Compute LevenshteinDistance. ''' </summary> Public Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer Dim n As Integer = s.Length Dim m As Integer = t.Length Dim d(n + 1, m + 1) As Integer If n = 0 Then Return m End If If m = 0 Then Return n End If Dim i As Integer Dim j As Integer For i = 0 To n d(i, 0) = i Next For j = 0 To m d(0, j) = j Next For i = 1 To n For j = 1 To m Dim cost As Integer If t(j - 1) = s(i - 1) Then cost = 0 Else cost = 1 End If d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost) Next Next Return d(n, m) End Function End Module1 5 3
The number returned by Levenshtein for the arguments "aunt" and "ant" is 1. This means only one edit is required to get from aunt to ant—the letter U is removed.
The theory of the Levenshtein distance algorithm is not covered here. But the implementation here is correct on the example Strings tested. It is not fast.
Certain optimizations could be used to enhance performance, such as memoization. A data structure such as a Dictionary
could store the two strings and also a cached distance.