C# IEqualityComparer

Interface

IEqualityComparer introduces a custom GetHashCode method. We implement this interface in the C# language. We also measure performance. We compare GetHashCode methods on custom string keys. We use it with the Dictionary type.

Note:You can implement the IEqualityComparer interface for the Dictionary collection.

Note 2:The built-in C# hash code on string types is not ideal. Most GetHashCode implementations will be specific to your data.

Benchmark: tested on 499 string keys

Default string IEqualityComparer: 4043 ms
Custom IEqualityComparer:         2736 ms

Example

First we see the custom IEqualityComparer I defined. It implements the IEqualityComparer interface, and its name is StringIndexKeyComparer. Your comparer can have any name, but it must use the interface inheritance syntax as shown.

Interface
Class that implements IEqualityComparer: C#

public class StringIndexKeyComparer : IEqualityComparer<string>
{
    /// <summary>
    /// Has a good distribution.
    /// </summary>
    const int _multiplier = 89;

    /// <summary>
    /// Whether the two strings are equal
    /// </summary>
    public bool Equals(string x, string y)
    {
	return x == y;
    }

    /// <summary>
    /// Return the hash code for this string.
    /// </summary>
    public int GetHashCode(string obj)
    {
	// Stores the result.
	int result = 0;

	// Don't compute hash code on null object.
	if (obj == null)
	{
	    return 0;
	}

	// Get length.
	int length = obj.Length;

	// Return default code for zero-length strings [valid, nothing to hash with].
	if (length > 0)
	{
	    // Compute hash for strings with length greater than 1
	    char let1 = obj[0];          // First char of string we use
	    char let2 = obj[length - 1]; // Final char

	    // Compute hash code from two characters
	    int part1 = let1 + length;
	    result = (_multiplier * part1) + let2 + length;
	}
	return result;
    }
}
Squares

This class implements IEqualityComparer.
To implement that interface,
it must define Equals
and GetHashCode. The class uses the default Equals method, but uses a new GetHashCode implementation.

IEqualityComparer Interface: MSDN

Tip:The GetHashCode method above is optimal on the string keys I had. You will need to write your own GetHashCode method.

Tip 2:You should compute the result from the most significant parts of your keys, which provide the best distribution of the return value.

The GetHashCode shown here computes its result from the key[1] character, and from the final character in key. It applies a multiplier to the first character, and also adds the length of the key.

Multiply

Keys

Question

You can declare a generic Dictionary with any type keys and values. But the hashing method used for those types is always the default, unless you pass in an IEqualityComparer instance in the Dictionary constructor.

Note:For my project, I have a set of about 650 string keys. The string keys are in a certain format, with a pattern to their characters.

String keys

22D-Array-IEnumerable
22D-Array-Use
27-Zip
27-Zip-DEFLATE-Ratio
27-Zip-Examples
2About
2Action-Dictionary
2Adobe-CS3-Rounded
2Adobe-Fireworks-CS3-Resampling
String type

The string keys I have all have the same first character, which is a digit. Because this first character is monotonous, it isn't useful for hashing because its distribution is poor.

In the above strings, I decided that the best distribution of hash codes would use the second character, the length, and another character. To reduce collisions, one character would need to be multiplied by a constant.

Example 2

Program icon

To use the custom IEqualityComparer with the Dictionary, you must pass an instance of the IEqualityComparer to the Dictionary constructor. It is a good idea to use a static instance of your IEqualityComparer. This is not shown.

Next:The code uses the same custom IEqualityComparer shown earlier in this document. It is passed to the Dictionary in the constructor.

Constructor
C# program that uses custom class

using System;
using System.Collections.Generic;

/// <summary>
/// [see above]
/// </summary>
public class StringIndexKeyComparer : IEqualityComparer<string>
{
    // [see above]
}

class Program
{
    static void Main()
    {
	var custom = new Dictionary<string, bool>(new StringIndexKeyComparer());
	custom.Add("22D-Array-IEnumerable", true);
	custom.Add("22D-Array-Use", true);
	custom.Add("27-Zip", true);
	custom.Add("27-Zip-DEFLATE-Ratio", true);
	custom.Add("27-Zip-Examples", true);
	custom.Add("2About", true);
	custom.Add("2Action-Dictionary", true);
	custom.Add("2Adobe-CS3-Rounded", true);
	custom.Add("2Adobe-Fireworks-CS3-Resampling", true);
    }
}

Discussion

Hashing is a complicated topic but an excellent hash code can really improve your Dictionary's performance. There are excellent books on this topic, such as Robert Sedgewick's Algorithms in C++, which I used.

To better distribute the hash code computations in your program, using a multiplier is useful. A hashtable has an internal array of buckets, and when you use a multiplier, you can spread out the indexes to those buckets.

And:This reduces the number of hash collisions and improves lookup performance.

Warning: exclamation mark

The constant 89 is one I found to provide a good hashing distribution in the data set I am using. There is a custom constructor I defined on the StringIndexKeyComparer. It is not important to the code style here.

Tip:Any GetHashCode method will have to be custom to your keys. Therefore, just delete the contents of GetHashCode and start over.

But:It is not worth implementing on most projects, unless there is a bottleneck on your Dictionary, which rarely occurs.

Summary

We saw some example code, which won't be useful for you, but the explanations may help you understand how to use IEqualityComparer in your project. The most important part of IEqualityComparer here is its implementation of GetHashCode.


C#: Dictionary