C# IEqualityComparer

Use a custom IEqualityComparer to implement specialized lookup behavior for Dictionary.
IEqualityComparer. This interface introduces a custom GetHashCode method. We implement this interface in the C# language. We also measure performance.
String keys. We compare GetHashCode methods on custom string keys. We use it with the Dictionary type. You can implement the IEqualityComparer interface for the Dictionary collection.Dictionary
First example. We see the custom IEqualityComparer. StringIndexKeyComparer implements the IEqualityComparer interface. A comparer must use the interface inheritance syntax as shown.Interface

Class: This class implements IEqualityComparer. To implement that interface, it must define Equals and GetHashCode.

IEqualityComparer Interface: Microsoft Docs

GetHashCode: This method is effective on the string keys in the program. You will need to write your own GetHashCode method.

Details: GetHashCode computes its result from 2 chars. It applies a multiplier to the first char, and adds the length of the key.


Tip: You should compute the result from the most significant parts of your keys, which provide the best distribution of the return value.

Class that implements IEqualityComparer: C# public class StringIndexKeyComparer : IEqualityComparer<string> { /// <summary> /// Has a good distribution. /// </summary> const int _multiplier = 89; /// <summary> /// Whether the two strings are equal /// </summary> public bool Equals(string x, string y) { return x == y; } /// <summary> /// Return the hash code for this string. /// </summary> public int GetHashCode(string obj) { // Stores the result. int result = 0; // Don't compute hash code on null object. if (obj == null) { return 0; } // Get length. int length = obj.Length; // Return default code for zero-length strings [valid, nothing to hash with]. if (length > 0) { // Compute hash for strings with length greater than 1 char let1 = obj[0]; // First char of string we use char let2 = obj[length - 1]; // Final char // Compute hash code from two characters int part1 = let1 + length; result = (_multiplier * part1) + let2 + length; } return result; } }
Keys. A Dictionary can have any type keys and values. But the hashing method used for those types is the default, unless you pass in an IEqualityComparer in the Dictionary constructor.

Note: For my project, I have a set of about 650 string keys. The string keys are in a certain format, with a pattern to their characters.

String keys: 22D-Array-IEnumerable 22D-Array-Use 27-Zip 27-Zip-DEFLATE-Ratio 27-Zip-Examples 2About 2Action-Dictionary 2Adobe-CS3-Rounded 2Adobe-Fireworks-CS3-Resampling
Keys, analysis. The string keys have the same first character, which is a digit. Because this first character is monotonous, it isn't useful for hashing because its distribution is poor.

And: I decided that the best distribution of hash codes would use the second character, the length, and another character.

So: To reduce collisions, one character would need to be multiplied by a constant.

Benchmark: tested on 499 string keys Default string IEqualityComparer: 4043 ms Custom IEqualityComparer: 2736 ms
Example, use IEqualityComparer. We pass an instance of the IEqualityComparer to the Dictionary constructor. It is a good idea to use a static instance of your IEqualityComparer.

Next: The code uses the same custom IEqualityComparer shown earlier in this document. It is passed to the Dictionary in the constructor.

C# program that uses custom class using System; using System.Collections.Generic; /// <summary> /// [see above] /// </summary> public class StringIndexKeyComparer : IEqualityComparer<string> { // [see above] } class Program { static void Main() { var custom = new Dictionary<string, bool>(new StringIndexKeyComparer()); custom.Add("22D-Array-IEnumerable", true); custom.Add("22D-Array-Use", true); custom.Add("27-Zip", true); custom.Add("27-Zip-DEFLATE-Ratio", true); custom.Add("27-Zip-Examples", true); custom.Add("2About", true); custom.Add("2Action-Dictionary", true); custom.Add("2Adobe-CS3-Rounded", true); custom.Add("2Adobe-Fireworks-CS3-Resampling", true); } }
A discussion. An excellent hash code can improve your Dictionary's performance. There are excellent books on this topic, such as Robert Sedgewick's Algorithms in C++, which I used.

Tip: To better distribute the hash code computations in your program, using a multiplier is useful.

Tip 2: A hashtable has an internal array of buckets, and when you use a multiplier, you can spread out the indexes to those buckets.

And: This reduces the number of hash collisions and improves lookup performance.

Multiplier. The constant 89 provides a good hashing distribution in many string data sets. There is a custom constructor I defined on the StringIndexKeyComparer.

Tip: Any GetHashCode method will have to be custom to your keys. Therefore, just delete the contents of GetHashCode and start over.

But: It is not worth implementing on most projects, unless there is a bottleneck on your Dictionary, which rarely occurs.

Fuzzy lookups. A Dictionary by default uses exact lookups. It computes a hash code based on a key and if another value matches it exactly, the two keys match. But this can be changed.

Next: We implement a custom, fuzzy lookup that ignores characters. This is the custom IEqualityComparer.

Tip: It computes hashes (in GetHashCode) and compares keys (in Equals) ignoring all hyphen characters.

So: The string "cat-1" is considered equal to the string "cat1." The hyphen (dash) character is simply ignored.

Example custom IEqualityComparer: C# using System.Collections.Generic; class CustomComparer : IEqualityComparer<string> { public bool Equals(string x, string y) { int xPos = 0; int yPos = 0; while (true) { // ... Fail if past end. if (xPos >= x.Length) { return false; } if (yPos >= y.Length) { return false; } // ... Skip past hyphens. if (x[xPos] == '-') { xPos++; continue; } if (y[yPos] == '-') { yPos++; continue; } // ... Fail if different. if (x[xPos] != y[yPos]) { return false; } // ... If we have traversed both strings with no error, we match. if (xPos == x.Length - 1 && yPos == y.Length - 1) { return true; } // ... Increment both places. xPos++; yPos++; } } public int GetHashCode(string obj) { int code = 0; // ... Add together all chars. for (int i = 0; i < obj.Length; i++) { if (obj[i] != '-') { code += obj[i]; } } return code; } }
A warning. The code has some problems. Its implementation of GetHashCode is slow and would become slower on many keys. But the code shows the concept.

And: It avoids string allocations. I have found this is often a key to good performance.

Fuzzy, continued. Next, we use the CustomComparer we created. This Main method shows how hyphens are ignored in the keys of a Dictionary that uses this comparer.MainVar

Here: All three lookups return the correct value. Some unusual cases, not shown, may not work correctly (such as trailing hyphens).

C# program that uses Dictionary, CustomComparer using System; using System.Collections.Generic; class Program { static void Main() { // ... Add data to dictionary. var dictionary = new Dictionary<string, int>(new CustomComparer()); dictionary["cat-1"] = 1; dictionary["cat-2"] = 2; dictionary["dog-bark"] = 10; dictionary["dog-woof"] = 20; // ... Lookup values, ignoring hyphens. Console.WriteLine(dictionary["cat1"]); Console.WriteLine(dictionary["cat-1"]); Console.WriteLine(dictionary["dog--bark"]); } } Output 1 1 10
Fuzzy, uses. This code has many possible uses. It can potentially eliminate many Substring, Replace or ToLower calls. You can simply use a hashing function, with no new strings.SubstringReplaceToLower

Tip: You could even implement a simple language using this hashing mechanism. It could process strings without modifying them.

In my experiments, avoiding string allocations (caused by Substring or similar methods) is usually a performance win. And this style of IEqualityComparer tends to benefit performance.Optimization
GetHashCode. This method can be optimized. Try multiplying its result by an important character in the string. Also, there is an easy way to test how selective your hash numbers are.GetHashCode

Test: Increment a static int field each time Equals is called. With a better hash code, Equals is called fewer times.

Tip: With a more selective hash, the internal Dictionary methods need to search fewer values with Equals, so it is called less.

With IEqualityComparer, we look up strings based on the results of a method. We apply fuzzy matching, or any other transformation before comparing keys. We need not modify those keys.
© 2007-2019 Sam Allen. Every person is special and unique. Send bug reports to
Dot Net Perls