C# Dictionary Optimization Tip

You have a Dictionary generic collection in your C# program that does frequent lookups and may be in a hot path of your program during runtime. Sometimes you can optimize hashtables like Dictionary simply by changing the capacity of the collection so that it is higher than the default. This trades space for speed.

This C# performance article shows how to optimize Dictionary lookups with a capacity.

Dictionary optimization graph

Benchmark

First, the internal implementation of the Dictionary is rather opaque and not many tutorials will tell you the tricks necessary to squeeze out nanoseconds from lookups. The example here shows how you can pass a parameter to the Dictionary constructor to indicate a minimum capacity for the Dictionary. This forces the Dictionary to allocate at least that many internal "buckets" and entries. By increasing this capacity, you can reduce hash collisions and improve performance, with a multiplier of 4 yielding a speedup of 7% in the example.

Program that optimizes Dictionary [C#]

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;

class Program
{
    static void Main()
    {
	// Loop through full capacity multipliers.
	for (int multiplier = 1; multiplier <= 10; multiplier++)
	{
	    const int len = 500;
	    var dict = new Dictionary<string, bool>(len * multiplier); // Allocate with multiplied capacity
	    var arr = GetStrings(len); // Get random keys

	    foreach (string val in arr)
	    {
		dict[val] = true; // Set keys
	    }

	    const int m = 5000 * 10;
	    Stopwatch s1 = Stopwatch.StartNew();
	    for (int i = 0; i < m; i++)
	    {
		for (int j = 0; j < arr.Length; j++)
		{
		    bool b = dict[arr[j]]; // Lookup element
		    b = dict[arr[0]];      // Lookup first element
		}
	    }
	    s1.Stop();

	    // Write timings
	    Console.Write(multiplier.ToString("00"));
	    Console.Write(", ");
	    Console.Write(s1.ElapsedMilliseconds);
	    Console.WriteLine(" ms");
	}
	Console.Read();
    }

    static string[] GetStrings(int len)
    {
	// Allocate and return an array of random strings.
	var arr = new string[len];
	for (int i = 0; i < arr.Length; i++)
	{
	    arr[i] = Path.GetRandomFileName();
	}
	return arr;
    }
}

Output

01, 2744 ms   (Exact capacity)
02, 2665 ms
03, 2553 ms
04, 2546 ms   (7.2% faster than exact capacity with multipler 1)
05, 2569 ms
06, 2562 ms
07, 2532 ms
08, 2552 ms
09, 2531 ms
10, 2573 ms
Main method

Overview. The program defines two methods in the Program.cs file. The Main entry point runs the simulations that time the Dictionary lookup speed on the Dictionary with different capacities. The program puts 500 strings as keys into the Dictionary, so the capacity of the Dictionary in the loop will be 500, 1000, 1500, 2000, 2500, and so on.

GetStrings method usage. The program defines the GetStrings method, which uses the Path.GetRandomFileName method to generate 500 random file names of 12 characters. This array is returned and used to populate the Dictionary.

Output. The program writes ten lines to the screen, with each indicating the current multiplier and the time in milliseconds for all the lookups to complete. There will be 50000 loops over the entire collection of 500 strings, which is 25 million lookups. The program shows that multiplying the full capacity by 4 can improve lookup performance by 7.2% over multiplying it by 1.

Theory

Note

Here we note some of the internal data structures in the Dictionary collection in the C# language that allows this optimization to yield results. Whenever you add or lookup an entry in the Dictionary, a buckets array is accessed (written or read).

The buckets array contains integers that point to the actual data structs in an Entry[] array. When you have more buckets, you can more closely map the bucket integers to the accurate entry; with fewer buckets, you will have to read in the next entries in the chains more.

Summary

The C# programming language

We looked at a way you can optimize your Dictionary in the C# language by applying a small multiplier such as 4 to the initial and final capacity of the Dictionary. This optimization allows the Dictionary to have more space to store buckets that point to entries.

Collections
.NET