C# Capacity Property

Dot Net Perls
Generic type

You are able to specify capacity for C# collections, but wonder whether it is worth the bother. This optional Capacity property adjusts the amount of memory allocated. Here we experiment with Dictionary and List capacity values, benchmarking their effect on performance, using the C# programming language.

Capacity benchmark results

Method A: Dictionary, no capacity
Time:     1350 ms

Method B: Dictionary, has capacity
Time:     700 ms

Method C: Dictionary, const capacity
Time:     760 ms

Method D: Dictionary, over-large capacity
Time:     1005 ms

Method E: List, no capacity
Time:     1010 ms

Method F: List, accurate capacity
Time:     575 ms

Capacity property

I found that when you don't specify a capacity, your collection will be reallocated several times if it even grows to have 100 elements. This makes populating your Dictionary or List twice as slow.

Note (please read)

What Microsoft says: "The capacity of a Dictionary(TKey, TValue) is the number of elements that can be added to the Dictionary(TKey, TValue) before resizing is necessary. As elements are added to a Dictionary(TKey, TValue), the capacity is automatically increased as required by reallocating the internal array."

MSDN reference

So why set capacity? "If the size of the collection can be estimated, specifying the initial capacity eliminates the need to perform a number of resizing operations while adding elements to the Dictionary(TKey, TValue)."

Is it important? Should developers bother with Dictionary or List capacity? What I find might be surprising. Having .NET "resize" the underlying arrays is really expensive and can double the time even in small collections.

.NET Framework information

Memory pressure. Another final thing to consider is memory pressure in .NET. When the collections' underlying arrays are resized, this results in more memory pressure due to the reallocation. Therefore many aspects of your program will become slower and less efficient.

Benchmark capacity

Here we see the benchmark that tests the capacity settings. I tried to simulate somewhat realistic situations in my programs. Most Dictionaries I have used have a string key and around 100-1000 elements. So I used a 100-element size of the collections.

Program that tests capacity [C#]

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;

class CapacityTest
{
    /// <summary>
    /// The number of iterations.
    /// </summary>
    static int _m = 100000;

    /// <summary>
    /// List of values we use for the test.
    /// </summary>
    List<string> _values = new List<string>();

    public CapacityTest()
    {
	// 100 random strings for benchmarking.
	for (int i = 0; i < 100; i++)
	{
	    _values.Add(System.IO.Path.GetRandomFileName());
	}

	long t1 = Environment.TickCount;

	A_Dictionary();

	long t2 = Environment.TickCount;

	B_DictionaryCapacity();

	long t3 = Environment.TickCount;

	C_DictionaryCapacity();

	long t4 = Environment.TickCount;

	D_DictionaryCapacity();

	long t5 = Environment.TickCount;

	E_List();

	long t6 = Environment.TickCount;

	F_ListCapacity();

	long t7 = Environment.TickCount;

	// Write dictionary times
	Console.WriteLine(t2 - t1);
	Console.WriteLine(t3 - t2);
	Console.WriteLine(t4 - t3);
	Console.WriteLine(t5 - t4);

	// Write list times
	Console.WriteLine(t6 - t5);
	Console.WriteLine(t7 - t6);
	Console.ReadLine();
    }

    void A_Dictionary()
    {
	// No capacity
	for (int i = 0; i < _m; i++)
	{
	    Dictionary<string, int> d = new Dictionary<string, int>();
	    foreach (string k in _values)
	    {
		d.Add(k, 0);
	    }
	}
    }

    void B_DictionaryCapacity()
    {
	// Capacity from collection Count
	for (int i = 0; i < _m; i++)
	{
	    Dictionary<string, int> d = new Dictionary<string, int>(_values.Count);
	    foreach (string k in _values)
	    {
		d.Add(k, 0);
	    }
	}
    }

    void C_DictionaryCapacity()
    {
	// Const capacity
	for (int i = 0; i < _m; i++)
	{
	    Dictionary<string, int> d = new Dictionary<string, int>(100);
	    foreach (string k in _values)
	    {
		d.Add(k, 0);
	    }
	}
    }

    void D_DictionaryCapacity()
    {
	// Huge capacity (1 order of magnitude too large)
	for (int i = 0; i < _m; i++)
	{
	    Dictionary<string, int> d = new Dictionary<string, int>(1000);
	    foreach (string k in _values)
	    {
		d.Add(k, 0);
	    }
	}
    }

    void E_List()
    {
	// No capacity
	for (int i = 0; i < _m * 5; i++)
	{
	    List<string> l = new List<string>();
	    foreach (string k in _values)
	    {
		l.Add(k);
	    }
	}
    }

    void F_ListCapacity()
    {
	// Exact capacity
	for (int i = 0; i < _m * 5; i++)
	{
	    List<string> l = new List<string>(100);
	    foreach (string k in _values)
	    {
		l.Add(k);
	    }
	}
    }
}

Interpretation. It is very advantageous to specify capacity in your C# programs. What I found to be the best method was using the List's already-known capacity as the capacity for the new collection. That would be 100.

Programming tip

Predict capacities

You can modify your program to print out what the Count property of your collections is after they are used. If you are working with 150 elements, you can simply use the constant 200 for better performance (saving several array resizes).

Public const ints. For my latest programs, I decided to simply define some const global variables and use them through my classes. Make a static class and use consts called ElementCountEstimate or other fields with the word estimate.

Class that stores capacities [C#]

static class GlobalClass
{
    public const int ElementCountEstimate = 200;
    public const int OtherCountEstimate = 100;
}

Summary

Here we saw that the memory model of .NET programs is complex and takes care of most of the hard stuff for you, but giving it hints about the sizes of the collections is a substantial optimization. I could not test the impact of the capacity property in my real-world programs. This is because .NET is doing so many things at once that it gets lost in the noise.

Collections