C# Levenshtein

Collection abstraction: squares

In 1965 Vladmir Levenshtein created a distance algorithm. This tells us the number of edits needed to turn one string into another. With Levenshtein distance, we measure similarity and match approximate strings with fuzzy logic.

Info:Returns the number of character edits (removals, inserts, replacements) that must occur to get from string A to string B.

Levenshtein distance computations

Words:                ant, aunt
Levenshtein distance: 1
Note:                 Only 1 edit is needed.
		      The 'u' must be added at index 2.

Words:                Samantha, Sam
Levenshtein distance: 5
Note:                 The final 5 letters must be removed.

Words:                Flomax, Volmax
Levenshtein distance: 3
Note:                 The first 3 letters must be changed
		      Drug names are commonly confused.

Example

Note

First, credit at the conceptual level goes to Vladimir Levenshtein, a Russian scientist. This code uses a two-dimensional array instead of a jagged array because the space required will only have one width and one height.

Tip:The two-dimensional array requires fewer allocations upon the managed heap and may be faster in this context.

2D Array
C# program that implements the algorithm

using System;

/// <summary>
/// Contains approximate string matching
/// </summary>
static class LevenshteinDistance
{
    /// <summary>
    /// Compute the distance between two strings.
    /// </summary>
    public static int Compute(string s, string t)
    {
	int n = s.Length;
	int m = t.Length;
	int[,] d = new int[n + 1, m + 1];

	// Step 1
	if (n == 0)
	{
	    return m;
	}

	if (m == 0)
	{
	    return n;
	}

	// Step 2
	for (int i = 0; i <= n; d[i, 0] = i++)
	{
	}

	for (int j = 0; j <= m; d[0, j] = j++)
	{
	}

	// Step 3
	for (int i = 1; i <= n; i++)
	{
	    //Step 4
	    for (int j = 1; j <= m; j++)
	    {
		// Step 5
		int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;

		// Step 6
		d[i, j] = Math.Min(
		    Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
		    d[i - 1, j - 1] + cost);
	    }
	}
	// Step 7
	return d[n, m];
    }
}

class Program
{
    static void Main()
    {
	Console.WriteLine(LevenshteinDistance.Compute("aunt", "ant"));
	Console.WriteLine(LevenshteinDistance.Compute("Sam", "Samantha"));
	Console.WriteLine(LevenshteinDistance.Compute("flomax", "volmax"));
    }
}

Output

1
5
3
Squares

The Levenshtein method is static—this Compute method doesn't need to store state or instance data, which means you can declare it as static. This can also improve performance, avoiding callvirt instructions.

Tip:You can verify the implementation is the standard version of Levenshtein by looking at a computer science textbook.

This algorithm is stateless, which means it doesn't store instance data. It therefore can be put in a static class. Static classes are easier to add to new projects than separate methods.

Static Class

Example 2

Method call

Continuing on, we call the method. You will often want to compare multiple strings with the Levenshtein algorithm. The example here shows how to compare strings in a loop. We use a List of string arrays.

ListString Array
C# program that calls Levenshtein in loop

static void Main()
{
    List<string[]> l = new List<string[]>
    {
	new string[]{"ant", "aunt"},
	new string[]{"Sam", "Samantha"},
	new string[]{"clozapine", "olanzapine"},
	new string[]{"flomax", "volmax"},
	new string[]{"toradol", "tramadol"},
	new string[]{"kitten", "sitting"}
    };

    foreach (string[] a in l)
    {
	int cost = Compute(a[0], a[1]);
	Console.WriteLine("{0} -> {1} = {2}",
	    a[0],
	    a[1],
	    cost);
    }
}

Output

ant -> aunt = 1
Sam -> Samantha = 5
clozapine -> olanzapine = 3
flomax -> volmax = 3
toradol -> tramadol = 3
kitten -> sitting = 3

Summary

The C# programming language

We saw the Levenshtein Distance algorithm—this implements approximate string matching. The difference between two strings is not represented as true or false, but as the number of steps needed to get from one to the other.

Finally:As a reminder, the brilliance of the algorithm comes from Dr. Levenshtein.


C#: Method