C# HashSet Programs

Set collection

HashSet is an optimized set collection. It can be used to eliminate duplicate strings or elements in an array. The HashSet collection in System.Collections.Generic in the C# language provides a simple syntax for taking the union of elements in a set. This is performed in the constructor.

This C# example uses HashSet, which is an optimized set collection.

Example

Array type

This program contains a source array that contains several duplicated strings. It eliminates duplicate strings in the array. The program calls the HashSet constructor to transform the array elements into a set data structure. This internally calls the UnionWith method to eliminate the duplications. ToArray then transforms the HashSet elements into a new array.

ToArray
Program that uses HashSet on duplicates [C#]

using System;
using System.Collections.Generic;
using System.Linq;

class Program
{
    static void Main()
    {
	// Input array that contains three duplicate strings.
	string[] array1 = { "cat", "dog", "cat", "leopard", "tiger", "cat" };

	// Display the array.
	Console.WriteLine(string.Join(",", array1));

	// Use HashSet constructor to ensure unique strings.
	var hash = new HashSet<string>(array1);

	// Convert to array of strings again.
	string[] array2 = hash.ToArray();

	// Display the resulting array.
	Console.WriteLine(string.Join(",", array2));
    }
}

Output

cat,dog,cat,leopard,tiger,cat
cat,dog,leopard,tiger
String array illustration

Input array. In the program the input array contains six strings and only four unique strings. The string "cat" is repeated three times. This may not be desirable to some programs and algorithms. The HashSet constructor is used to eliminate the non-unique elements in the set.

String Array

Constructor. The HashSet constructor shown here receives a single parameter, which must implement the IEnumerable<string> generic interface. The constructor then uses logic to take the union of elements, which removes non-unique strings such as the string literal "cat".

Constructor Tips String LiteralJoin objects together

Joining strings. The program also demonstrates an easy way to display string[] arrays onto the console or as single strings using the string.Join static method. This method receives the result of the ToArray extension method, which was invoked on the HashSet instance.

string.Join

Dictionary

Programming tip

The logic performed in this example can also be implemented by using a Dictionary type instead of the HashSet type itself. The main problem caused by using Dictionary is that you must specify a value type. This may lead to more confusing code. The Dictionary code will also have more lines, but performance would be similar. The hash lookup loops are equivalent.

Dictionary

Performance

Performance optimization

Using Dictionary and HashSet results in allocations on the managed heap. This means that for small source inputs, the HashSet and Dictionary will be slower than simple nested loops. But when the source input becomes large with thousands of elements, the hash collections are faster. They use hash computations to search.

Dictionary Lookup Performance

Benchmark

Question and answer

Is there any performance benefit to using HashSet instead of Dictionary when you just need the simplest set functionality? In the C# language, a Dictionary with bool values can work as a set. If you do not need any advanced set logic, a Dictionary can replace the HashSet.

Question: Does HashSet have a performance advantage over Dictionary?

Comparison. Here, we test a HashSet(string) against a Dictionary(string, bool). We add several strings as keys and see if those keys exist (with Contains or ContainsKey). After the first loop iteration, no new elements are added, so this benchmark mostly tests search performance.

Program that tests HashSet performance [C#]

using System;
using System.Collections.Generic;
using System.Diagnostics;

class Program
{
    const int _max = 10000000;
    static void Main()
    {
	var h = new HashSet<string>(StringComparer.Ordinal);
	var d = new Dictionary<string, bool>(StringComparer.Ordinal);
	var a = new string[] { "a", "b", "c", "d", "longer", "words", "also" };

	var s1 = Stopwatch.StartNew();
	for (int i = 0; i < _max; i++)
	{
	    foreach (string s in a)
	    {
		h.Add(s);
		h.Contains(s);
	    }
	}
	s1.Stop();
	var s2 = Stopwatch.StartNew();
	for (int i = 0; i < _max; i++)
	{
	    foreach (string s in a)
	    {
		d[s] = true;
		d.ContainsKey(s);
	    }
	}
	s2.Stop();
	Console.WriteLine(h.Count);
	Console.WriteLine(d.Count);

	Console.WriteLine(((double)(s1.Elapsed.TotalMilliseconds * 1000000) /
	    _max).ToString("0.00 ns"));
	Console.WriteLine(((double)(s2.Elapsed.TotalMilliseconds * 1000000) /
	    _max).ToString("0.00 ns"));
	Console.Read();
    }
}

Output

7
7
529.99 ns
517.05 ns
.NET Framework information

Results. The Dictionary(string, bool) had slightly better performance in this test than did the HashSet(string). In fact, in most every test where the two collections offered similar functionality, the Dictionary was faster. This was tested on 32-bit Windows operating system in .NET Framework 4.0.

Dictionary StringComparer OptimizationThis section provides information

Recommendation. My guideline here is that the Dictionary should be used instead of the HashSet in places where the advanced HashSet functionality is not needed. It is probably worthwhile documenting why the value of the Dictionary is a bool (or other value). This has led to some confusion because it is not used.

Summary

The C# programming language

We saw the HashSet collection. HashSet can be applied to elegantly eliminate duplicates in an array. The HashSet class has many other methods and uses. But you can use its constructor for taking a union of a collection that implements the IEnumerable generic interface.

IEnumerable Type Collections
.NET