C# Anagram Method

Spot/tops: example of anagrams

Anagrams can be rearranged to form different words. We can use Dictionary and hash lookups to compute anagram lists quickly. This is useful for learning or making word games. Here we see how you can implement an anagram algorithm in the C# language.

This C# algorithm article implements an anagram solving method. It computes anagrams of any string.

Introduction

First, when you have two words that are anagrams, their alphabetized forms will be equal. This is because their letter frequencies are equal for each letter. This relationship forms the base of our solution.

Input string/alphabetized form

cat       act
tca       act
senators  aeonrsst
treasons  aeonrsst

Find anagrams

Here we see a console C# program that first reads in a local word list, which you will need to download separately, and then sorts each word. It then stores all the original words separately at that alphabetized key.

Program that finds anagrams in file [C#]

using System;
using System.Collections.Generic;
using System.IO;

class Anagrams
{
    static void Main()
    {
	// 1
	// Read and sort dictionary
	var d = Read();

	// 2
	// Read in user input and show anagrams
	string line;
	while ((line = Console.ReadLine()) != null)
	{
	    Show(d, line);
	}
    }

    static Dictionary<string, string> Read()
    {
	var d = new Dictionary<string, string>();
	// Read each line
	using (StreamReader r = new StreamReader("enable1.txt"))
	{
	    string line;
	    while ((line = r.ReadLine()) != null)
	    {
		// Alphabetize the line for the key
		// Then add to the value string
		string a = Alphabetize(line);
		string v;
		if (d.TryGetValue(a, out v))
		{
		    d[a] = v + "," + line;
		}
		else
		{
		    d.Add(a, line);
		}
	    }
	}
	return d;
    }

    static string Alphabetize(string s)
    {
	// Convert to char array, then sort and return
	char[] a = s.ToCharArray();
	Array.Sort(a);
	return new string(a);
    }

    static void Show(Dictionary<string, string> d, string w)
    {
	// Write value for alphabetized word
	string v;
	if (d.TryGetValue(Alphabetize(w), out v))
	{
	    Console.WriteLine(v);
	}
	else
	{
	    Console.WriteLine("-");
	}
    }
}

Output

martian,tamarin
opts,post,pots,spot,stop,tops
assentor,senators,starnose,treasons
Note

Overview. First, the program reads in the word list that must be stored locally on the disk. The word list used here has the name "enable1.txt", which is a version of the ENABLE word list available online. For each line, it alphabetizes the letters. This forms the alphabetized key for that word.

Alphabetize String

Alphabetized key. It then looks up that key in the Dictionary. If the alphabetized key is found, the current word we are processing will be appended to its value. In this way, we hash all real words at keys of their alphabetized forms. Finally, it accepts user input and prints each list that corresponds to the input after it is alphabetized.

String values

String type

For the Dictionary structure, we can use a simple string value type and append new strings to it. This means the data is stored as a single string. Alternatively, we could store the values as List<string>, but this degrades performance needlessly. To test, I altered the code so that it uses Dictionary<string, List<string>> instead of Dictionary<string, string>. This benchmark shows that it is faster to use the simpler string value than a List<string> type.

Anagram method benchmark
    Anagrams can be stored efficiently in a Dictionary.

Dictionary<string, string>       : 4556 ms
Dictionary<string, List<string>> : 5647 ms

Disk

Floppy disk illustration

For better performance, you can persist the alphabetized key/value pairs to the disk. This could reduce startup time considerably. Using a binary file would be even faster.

Convert Dictionary to String

Blanks in Scrabble

Here we mention that in Scrabble, blanks substitute for any letter. This can be dealt with in different ways in an anagram program: using a tree or with approximate string matching. You can use Levenshtein distance to find similar strings.

Levenshtein Distance

Word lists

Let's address the need to acquire a word list on the Internet. The dotnetperls-controls Google Code project has a nice list of words, which is a subset of the ENABLE word list. This is a public domain list of English words. It isn't exhaustive but is very thorough.

Download link

Summary

The C# programming language

We saw how to alphabetize words with Array.Sort and the new string constructor, and then hash those keys in a Dictionary. We implemented an anagram algorithm in the C# programming language. Finally, we compared performance of two alternatives, and selected the faster one.

Algorithms
.NET