C# HashSet Examples

Use HashSet, an optimized set collection. Test Overlaps, SymmetricExceptWith and benchmark HashSet.
HashSet. This is an optimized set collection. It helps eliminates duplicate strings or elements in an array. It is a set that hashes its contents.
With HashSet, we have a simple syntax for taking the union of elements in a set. This is performed in its constructor. More complex methods can be used on the HashSet.Constructor
This program contains a source array that contains several duplicated strings. It eliminates duplicate strings in the array. The program calls the HashSet constructor.

Note: This internally calls the UnionWith method to eliminate the duplications. ToArray transforms the HashSet into a new array.


Array: The input array contains six strings (four unique). The string "cat" is repeated three times.

Tip: The HashSet constructor eliminates the non-unique elements. The cats are removed.

C# program that uses HashSet on duplicates using System; using System.Collections.Generic; using System.Linq; class Program { static void Main() { // Input array that contains three duplicate strings. string[] array1 = { "cat", "dog", "cat", "leopard", "tiger", "cat" }; // Display the array. Console.WriteLine(string.Join(",", array1)); // Use HashSet constructor to ensure unique strings. var hash = new HashSet<string>(array1); // Convert to array of strings again. string[] array2 = hash.ToArray(); // Display the resulting array. Console.WriteLine(string.Join(",", array2)); } } Output cat,dog,cat,leopard,tiger,cat cat,dog,leopard,tiger
Notes, continued. The HashSet constructor receives a single parameter, which must implement the IEnumerable<string> generic interface. The constructor takes the union of elements.Generic Class, MethodString Literal

Also: The program displays string arrays onto the console or as single strings using the string.Join static method.


Tip: Join receives the result of the ToArray extension method, which was invoked on the HashSet instance.

Overlaps. This method returns true or false. It tests to see if any of the HashSet's elements are contained in the IEnumerable argument's elements. Only one equal element is required.IEnumerable

Next: The element 3 is in the HashSet. This means Overlaps returns true for array2, but false for array3.

C# program that uses Overlaps using System; using System.Collections.Generic; class Program { static void Main() { int[] array1 = { 1, 2, 3 }; int[] array2 = { 3, 4, 5 }; int[] array3 = { 9, 10, 11 }; HashSet<int> set = new HashSet<int>(array1); bool a = set.Overlaps(array2); bool b = set.Overlaps(array3); // Display results. Console.WriteLine(a); Console.WriteLine(b); } } Output True False
SymmetricExceptWith. HashSet has advanced set logic. SymmetricExceptWith changes HashSet so that it contains only the elements in one or the other collection—not both.

Tip: This example shows the use of the var-keyword. This simplifies the syntax of the HashSet declaration statement.

C# program that uses SymmetricExceptWith using System; using System.Collections.Generic; using System.Linq; class Program { static void Main() { char[] array1 = { 'a', 'b', 'c' }; char[] array2 = { 'b', 'c', 'd' }; var hash = new HashSet<char>(array1); hash.SymmetricExceptWith(array2); // Write char array. Console.WriteLine(hash.ToArray()); } } Output ad
Dictionary. Set logic can also be implemented by using a Dictionary instead of a HashSet. With a Dictionary you must specify a value type. This may lead to more confusing code.

Also: The Dictionary code will have more lines, but performance would be similar. The hash lookup loops are equivalent.

Allocations. Using Dictionary and HashSet results in allocations on the managed heap. For small source inputs, the HashSet and Dictionary will be slower than simple nested loops.

But: When the source input becomes large with thousands of elements, hashed collections are faster.

Dictionary vs. List
Benchmark. Is there any performance benefit to using HashSet instead of Dictionary? In the C# language, a Dictionary with bool values can work as a set.

Here: We test a HashSet(string) against a Dictionary(string, bool). We add strings as keys and see if those keys exist.

Result: The Dictionary had slightly better performance in this test than did the HashSet. In most tests the Dictionary was faster.

My guideline: Dictionary should be used instead of HashSet in places where advanced HashSet functionality is not needed.

C# program that tests HashSet performance using System; using System.Collections.Generic; using System.Diagnostics; class Program { const int _max = 10000000; static void Main() { var h = new HashSet<string>(StringComparer.Ordinal); var d = new Dictionary<string, bool>(StringComparer.Ordinal); var a = new string[] { "a", "b", "c", "d", "longer", "words", "also" }; var s1 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { foreach (string s in a) { h.Add(s); h.Contains(s); } } s1.Stop(); var s2 = Stopwatch.StartNew(); for (int i = 0; i < _max; i++) { foreach (string s in a) { d[s] = true; d.ContainsKey(s); } } s2.Stop(); Console.WriteLine(h.Count); Console.WriteLine(d.Count); Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) / _max).ToString("0.00 ns")); Console.Read(); } } Output 7 7 529.99 ns HashSet 517.05 ns Dictionary
A summary. HashSet can be applied to elegantly eliminate duplicates in an array. Its constructor takes a union of a collection that implements the IEnumerable generic interface.
© 2007-2019 Sam Allen. Every person is special and unique. Send bug reports to
Dot Net Perls