C# Dictionary Order

Performance optimization

You are interested in how the order you add keys to a Dictionary instance in the C# language affects the performance of accessing those keys through the lookup methods. Because the Dictionary uses a chaining algorithm, the keys that were added last can be faster to search for in very large Dictionary instances.

Key points: You have a key to add to your Dictionary. This key may collide with an non-equal key in the Dictionary. If you add it first, it will be slower to search for it. If you add it last, it will be faster to search for it.

Benchmark

Note

First, this program is a complete benchmark harness that adds ten million and two random path strings to a Dictionary collection in the C# language. A reference to the first and last keys added are stored as variables. Because each key is unique, no exceptions are thrown. The two tight loops then show the performance difference between looking up entries in the Dictionary that were added first, and last.

Program that tests dictionary performance and order [C#]

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;

class Program
{
    const int _max = 10000000;
    static void Main()
    {
	double msA = 0, msB = 0;
	//
	// Ten tests of dictionary order.
	//
	for (int y = 0; y < 10; y++)
	{
	    var dictionary = new Dictionary<string, int>(StringComparer.Ordinal);
	    string first = Path.GetRandomFileName();
	    dictionary.Add(first, 1);
	    for (int i = 0; i < 1000000; i++)
	    {
		dictionary.Add(Path.GetRandomFileName(), 1);
	    }
	    string last = Path.GetRandomFileName();
	    dictionary.Add(last, 1);
	    //
	    // Test the dictionaries.
	    //
	    var s1 = Stopwatch.StartNew();
	    for (int i = 0; i < _max; i++)
	    {
		int v;
		dictionary.TryGetValue(first, out v);
	    }
	    s1.Stop();
	    var s2 = Stopwatch.StartNew();
	    for (int i = 0; i < _max; i++)
	    {
		int v;
		dictionary.TryGetValue(last, out v);
	    }
	    s2.Stop();
	    msA += s1.Elapsed.TotalMilliseconds;
	    msB += s2.Elapsed.TotalMilliseconds;
	    Console.Write(".");
	    //
	    // Collect the garbage.
	    //
	    GC.Collect();
	}
	Console.WriteLine();
	Console.WriteLine(msA.ToString("0.00") + " ms");
	Console.WriteLine(msB.ToString("0.00") + " ms");
	Console.Read();
    }
}

Output
    (Second number is the last key search time elapsed.)

..........
3539.57 ms
3203.58 ms [Faster]
Main method

Overview. This program serves as a benchmarking harness for testing the performance of looking up a key that was added first to the Dictionary, and another key that was added last to the Dictionary. All the keys added to the Dictionary instance are random strings. A reference is retained to the first and last keys added, and two loops two the lookup of these keys.

Considerations tested. In testing this program, I varied the order of the two tests themselves. This did not affect the performance of the lookups in a substantial way. Also, I added the forced garbage collection call because otherwise the runtime might start a garbage collection during one of the two tests, making that look artificially slower. The program uses completely random path strings, so results will vary each pass through the loop.

Results. The results of this experiment were fairly consistent and sometimes pronounced. The last key added to the Dictionary was faster to access than the first key that was added. Usually the performance gain was around 10% faster for a Dictionary with ten million separate keys.

Note

Why? This section explains why the last key can be accessed faster than the first key. The Dictionary type internally uses a chaining algorithm. Essentially, each key you look up is converted to an index through the hash code computation. Each hash code is not unique, and each index is even less likely to be unique. Collisions are stored in a logical linked list.

When you add a key to the Dictionary and another key was added at this index, the new key is put in the first place in this logical linked list, and the old key is demoted to the next slot. Then, when you look up that key, the search algorithm loops through those entries in order, with the last key checked first.

Performance impact

The performance impact of changing the ordering of the keys you add to a Dictionary in the C# language will likely not be large. For this reason, this article emphasizes the theory and implementation issues rather than offering a recommendation or suggestion about performance.

Generally, the Dictionary lookup performance is good regardless of ordering if the hash code computation is well-distributed. However, for certain in-memory database programs, adding keys in order of ascending hit frequency could improve performance.

Summary

The C# programming language

We saw a performance test for Dictionary that showed that the very first key added to a Dictionary instance is slower to access than the last key. We noted that this is because of the internal chaining algorithm used in the Dictionary. This program is non-deterministic because it uses random file names, but the general result from executing it many times is that having an entry in the Dictionary that is at the start of the chain results in faster lookups.

Collections
.NET