Levenshtein Distance Algorithm
This page was last reviewed on Jan 24, 2022.
Dot Net Perls
Levenshtein. In 1965 Vladmir Levenshtein introduced a distance algorithm. This tells us the number of changes (edits) one string must go through to become another string.
Function notes. In VB.NET we compute Levenshtein distance with a Function. This function returns the number of character edits that must occur to get from String A to String B.
Example. The internals of the Levenshtein distance Function are complex. Internally it allocates a 2D array. This 2D array has dimensions equal to the lengths of the 2 argument strings.
2D Array
Detail A nested For-loop is executed. In the inner block of the For-loops, the cost of the change of two elements is computed.
Note The cost is one for a different character and zero for the same character.
Then Array elements are assigned, considering adjacent array elements and the cost. The ending array element is returned as the distance.
Module Module1 Sub Main() Console.WriteLine(LevenshteinDistance("aunt", "ant")) Console.WriteLine(LevenshteinDistance("Sam", "Samantha")) Console.WriteLine(LevenshteinDistance("flomax", "volmax")) End Sub ''' <summary> ''' Compute LevenshteinDistance. ''' </summary> Public Function LevenshteinDistance(ByVal s As String, ByVal t As String) As Integer Dim n As Integer = s.Length Dim m As Integer = t.Length Dim d(n + 1, m + 1) As Integer If n = 0 Then Return m End If If m = 0 Then Return n End If Dim i As Integer Dim j As Integer For i = 0 To n d(i, 0) = i Next For j = 0 To m d(0, j) = j Next For i = 1 To n For j = 1 To m Dim cost As Integer If t(j - 1) = s(i - 1) Then cost = 0 Else cost = 1 End If d(i, j) = Math.Min(Math.Min(d(i - 1, j) + 1, d(i, j - 1) + 1), d(i - 1, j - 1) + cost) Next Next Return d(n, m) End Function End Module
1 5 3
Function output. The number returned by Levenshtein for the arguments "aunt" and "ant" is 1. This means only one edit is required to get from aunt to ant—the letter U is removed.
And For the second result, we have 5 edits. The name "Sam" must be edited 5 times to get "Samantha"—5 letters are added.
Finally And to get to "volmax" from "flomax" 3 edits must occur. Swapping 2 characters is considered 2 separate edits.
The theory of the Levenshtein distance algorithm is not covered here. But the implementation here is correct on the example Strings tested. It is not fast.
Certain optimizations could be used to enhance performance, such as memoization. A data structure such as a Dictionary could store the two strings and also a cached distance.
Dot Net Perls is a collection of tested code examples. Pages are continually updated to stay current, with code correctness a top priority.
Sam Allen is passionate about computer languages. In the past, his work has been recommended by Apple and Microsoft and he has studied computers at a selective university in the United States.
This page was last updated on Jan 24, 2022 (edit).
© 2007-2024 Sam Allen.