HomeSearch

C# IEqualityComparer

Use a custom IEqualityComparer to implement specialized lookup behavior for Dictionary.
IEqualityComparer. This interface introduces a custom GetHashCode method. We implement this interface in the C# language. We also measure performance.Interface
String keys. We compare GetHashCode methods on custom string keys. We use it with the Dictionary type. You can implement the IEqualityComparer interface for the Dictionary collection.Dictionary
First example. We see the custom IEqualityComparer. StringIndexKeyComparer implements the IEqualityComparer interface. A comparer must use the interface inheritance syntax as shown.

Class: This class implements IEqualityComparer. To implement that interface, it must define Equals and GetHashCode.

IEqualityComparer Interface: Microsoft Docs

GetHashCode: This method is effective on the string keys in the program. You will need to write your own GetHashCode method.

Details: GetHashCode computes its result from 2 chars. It applies a multiplier to the first char, and adds the length of the key.

Multiply

Tip: You should compute the result from the most significant parts of your keys, which provide the best distribution of the return value.

Class that implements IEqualityComparer: C# public class StringIndexKeyComparer : IEqualityComparer<string> { /// <summary> /// Has a good distribution. /// </summary> const int _multiplier = 89; /// <summary> /// Whether the two strings are equal /// </summary> public bool Equals(string x, string y) { return x == y; } /// <summary> /// Return the hash code for this string. /// </summary> public int GetHashCode(string obj) { // Stores the result. int result = 0; // Don't compute hash code on null object. if (obj == null) { return 0; } // Get length. int length = obj.Length; // Return default code for zero-length strings [valid, nothing to hash with]. if (length > 0) { // Compute hash for strings with length greater than 1 char let1 = obj[0]; // First char of string we use char let2 = obj[length - 1]; // Final char // Compute hash code from two characters int part1 = let1 + length; result = (_multiplier * part1) + let2 + length; } return result; } }
Keys, analysis. A Dictionary can have any type keys and values. But the hashing method is the default, unless you pass in an IEqualityComparer in the Dictionary constructor.

Note: For my project, I have a set of about 650 string keys. The string keys are in a certain format, with a pattern to their characters.

And: I decided that the best distribution of hash codes would use the second character, the length, and another character.

String keys: 22D-Array-IEnumerable 22D-Array-Use 27-Zip 27-Zip-DEFLATE-Ratio 27-Zip-Examples 2About 2Action-Dictionary 2Adobe-CS3-Rounded 2Adobe-Fireworks-CS3-Resampling
Benchmark, IEqualityComparer. Often IEqualityComparer can be used as a performance optimization, or to reduce string allocations. We can add logic to GetHashCode without creating a string.

Version 1: The code uses the same custom IEqualityComparer shown earlier in this document. The ContainsKey method is called many times.

Constructor

Version 2: We use a Dictionary without a custom IEqualityComparer. So each string uses the default GetHashCode method.

Result: In 2020, using the custom IEqualityComparer is faster. We avoid some expensive hashing—but this code depends on the key format.

C# program that uses custom IEqualityComparer using System; using System.Diagnostics; using System.Collections.Generic; public class StringIndexKeyComparer : IEqualityComparer<string> { const int _multiplier = 89; public bool Equals(string x, string y) { return x == y; } public int GetHashCode(string obj) { int result = 0; if (obj == null) { return 0; } int length = obj.Length; if (length > 0) { char let1 = obj[0]; char let2 = obj[length - 1]; int part1 = let1 + length; result = (_multiplier * part1) + let2 + length; } return result; } } class Program { const int _max = 100000000; static void Main() { var test = new Dictionary<string, bool>(new StringIndexKeyComparer()); var test2 = new Dictionary<string, bool>(); // Elements. string[] elements = { "22D-Array-IEnumerable", "22D-Array-Use", "27-Zip", "27-Zip-DEFLATE-Ratio", "27-Zip-Examples", "2About", "2Action-Dictionary", "2Adobe-CS3-Rounded", "2Adobe-Fireworks-CS3-Resampling" }; // Add elements. foreach (string element in elements) { test.Add(element, true); test2.Add(element, true); } var s1 = Stopwatch.StartNew(); // Version 1: test custom IEqualityComparer. for (int i = 0; i < _max; i++) { if (!test.ContainsKey("2About")) { return; } } s1.Stop(); var s2 = Stopwatch.StartNew(); // Version 2: use default dictionary. for (int i = 0; i < _max; i++) { if (!test2.ContainsKey("2About")) { return; } } s2.Stop(); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); } } Output 15.54 ns StringIndexKeyComparer 23.34 ns Default
Fuzzy lookups. A Dictionary by default uses exact lookups. It computes a hash code based on a key and if another value matches it exactly, the two keys match. But this can be changed.

Next: We implement a custom, fuzzy lookup that ignores characters. This is the custom IEqualityComparer.

Tip: It computes hashes (in GetHashCode) and compares keys (in Equals) ignoring all hyphen characters.

So: The string "cat-1" is considered equal to the string "cat1." The hyphen (dash) character is simply ignored.

Example custom IEqualityComparer: C# using System.Collections.Generic; class CustomComparer : IEqualityComparer<string> { public bool Equals(string x, string y) { int xPos = 0; int yPos = 0; while (true) { // ... Fail if past end. if (xPos >= x.Length) { return false; } if (yPos >= y.Length) { return false; } // ... Skip past hyphens. if (x[xPos] == '-') { xPos++; continue; } if (y[yPos] == '-') { yPos++; continue; } // ... Fail if different. if (x[xPos] != y[yPos]) { return false; } // ... If we have traversed both strings with no error, we match. if (xPos == x.Length - 1 && yPos == y.Length - 1) { return true; } // ... Increment both places. xPos++; yPos++; } } public int GetHashCode(string obj) { int code = 0; // ... Add together all chars. for (int i = 0; i < obj.Length; i++) { if (obj[i] != '-') { code += obj[i]; } } return code; } }
Fuzzy, continued. Next, we use the CustomComparer we created. This Main method shows how hyphens are ignored in the keys of a Dictionary that uses this comparer.MainVar

Here: All three lookups return the correct value. Some unusual cases, not shown, may not work correctly (such as trailing hyphens).

C# program that uses Dictionary, CustomComparer using System; using System.Collections.Generic; class Program { static void Main() { // ... Add data to dictionary. var dictionary = new Dictionary<string, int>(new CustomComparer()); dictionary["cat-1"] = 1; dictionary["cat-2"] = 2; dictionary["dog-bark"] = 10; dictionary["dog-woof"] = 20; // ... Lookup values, ignoring hyphens. Console.WriteLine(dictionary["cat1"]); Console.WriteLine(dictionary["cat-1"]); Console.WriteLine(dictionary["dog--bark"]); } } Output 1 1 10
A discussion. An excellent hash code can improve your Dictionary's performance. There are excellent books on this topic, such as Robert Sedgewick's Algorithms in C++, which I used.

Tip: To better distribute the hash code computations in your program, using a multiplier is useful.

Tip 2: A hashtable has an internal array of buckets, and when you use a multiplier, you can spread out the indexes to those buckets.

And: This reduces the number of hash collisions and improves lookup performance.

Multiplier. The constant 89 provides a good hashing distribution in many string data sets. There is a custom constructor I defined on the StringIndexKeyComparer.

Tip: Any GetHashCode method will have to be custom to your keys. Therefore, just delete the contents of GetHashCode and start over.

But: It is not worth implementing on most projects, unless there is a bottleneck on your Dictionary, which rarely occurs.

GetHashCode. This method can be optimized. Try multiplying its result by an important character in the string. Also, there is an easy way to test how selective your hash numbers are.GetHashCode

Test: Increment a static int field each time Equals is called. With a better hash code, Equals is called fewer times.

Tip: With a more selective hash, the internal Dictionary methods need to search fewer values with Equals, so it is called less.

A summary. With IEqualityComparer, we look up strings based on the results of a method. We can apply fuzzy matching, or any other transformation before comparing keys.
Home
Dot Net Perls
© 2007-2020 Sam Allen. Every person is special and unique. Send bug reports to info@dotnetperls.com.