.Net Dictionary<> speed and memory

By | 2011.09.26

I love writing and optimizing stuff that puts strain on memory and CPU. Lately I’ve been wondering how well the .Net Dictionary object performs, and after looking at the source code for it, Googling and SO’ing I thought it would be good to complete the project with some real-world data that could be shared.

So I made a small test app to measure the speed and memory consumption of a System.Collections.Generic.Dictionary<Int32, Int32> in .Net.

I wanted to test with random data, so a set of random data (Int32) is generated up front. I also wanted to test lookup time with random access to simulate real-world usage where L2 cache/read-ahead might not be applicable.

Test machine specs

As of mid 2011 this computer is new and can be considered a fast desktop computer. I designed it myself for high performance computations, I bought “the best” within reasonable price parameters.

  • AMD Phenom II X6 1100T (Socket-AM3, Six Core, 3,3GHZ/3,7GHz Turbo, 9MB) set at conservative overclocking settings (bad cooling at the moment): Asus AISuite reports 3,410 GHz
  • ASUS Crosshair IV Extreme
  • 16 GB of Corsair Vengeance DDR3 1600MHz 16GB CL9
  • 2xSATA disks@550MB/s.
  • Windows 7 Ultimate 64-bit
  • Almost clean install, just Visual Studio and required dev tools.

The test

The test runs on a single thread, x64 Console Application, .Net 4.0 Client Profile. My interest is purely in the Dictionary index size and speed.

The receipt

  • Create a list of 100.000.000 random unique Int32 numbers.
  • Warm up by running once with 10k samples.
  • For every size of 10, 100, 1.000, 10.000, 100.000, 1.000.000, 10.000.000
    • Add this amount of random numbers to the dictionary. The random numbers were generated during startup so we are slowed by any overhead in random routine. It is a simple int copy operation from an int[] array.
    • If the dictionary is big enough, search it for 100.000 random keys. The random keys to search for are also generated on startup, and they may or may not exist in the dictionary.

The results

The short story

  • Dictionaries in .Net are very fast in almost any case and perform extremely well with large volumes.
    100 000 random lookups in a table consisting of 10 000 000 random entries takes 0,02 seconds, the table uses 200MB of RAM.
    (Note on that: Many of these lookups would not yield matches.)
  • Memory used by a dictionary seems to be around 20-40 times for Int32 index. The ratio goes down as the dictionary gets larger.
  • Pre-allocating memory seems to almost cut insert time in half.

Hits

These tests have a lot of misses in the lookups (random range is between Int32.MinValue and Int32.MaxValue). If I recall correctly then misses are a worst-case scenario for hashtables. I re-did the tests with hits only and got about the same results, up to 30% faster in some cases. Though I have not tested this in-depth.

Details

image

The source code

static void Main(string[] args)
{
    DictionaryTest dictionaryTest;
    dictionaryTest = new DictionaryTest(100000000);
    
    // Warmup
    dictionaryTest.RunTest(new int[] { 100000 }, false);

    // Run tests
    dictionaryTest.RunTest(new int[] { 10, 100, 1000, 10000, 100000, 1000000, 10000000 }, false);
    dictionaryTest.RunTest(new int[] { 10, 100, 1000, 10000, 100000, 1000000, 10000000 }, true);
    Console.ReadKey();
}
public class DictionaryTest
{
    private int[] TestData;
    private int[] RandomPos;
    public int DummyToPreventOptimization = 0;

    public DictionaryTest(int maxSamples)
    {
        FillRandom(maxSamples);
    }

    private void FillRandom(int size)
    {
        Console.WriteLine("Generating " + size + " random samples...");
        Random rnd = new Random();
        TestData = new int[size];
        RandomPos = new int[10000];

        // We need to add only unique data
        Dictionary<int, bool> used = new Dictionary<int, bool>(size);

        for (int i = 0; i < size; i++)
        {
            int n;
            for (; ; )
            {
                n = rnd.Next(int.MinValue, int.MaxValue);
                if (!used.ContainsKey(n))
                    break;
            }
            used.Add(n, true);
            TestData[i] = n;
        }

        Console.WriteLine("Generating " + RandomPos.Length + " random lookup points...");
        for (int i = 0; i < RandomPos.Length; i++)
        {
            RandomPos[i] = rnd.Next(0, RandomPos.Length);
        }
    }

    public void RunTest(int[] sizes, bool setDictionarySize)
    {
        // Force GC to finish up.
        Stopwatch stopwatch = new Stopwatch();

        foreach (int size in sizes)
        {

            Console.WriteLine("*** Testing " + size + " samples:");
            // Fill dictionary with random data
            long memBefore = GC.GetTotalMemory(true);
            Dictionary<int, int> dictionary;
            if (setDictionarySize)
                dictionary = new Dictionary<int, int>(size);
            else
                dictionary = new Dictionary<int, int>();
            stopwatch.Reset();
            stopwatch.Start();
            for (int i = 0; i < size; i++)
            {
                dictionary.Add(TestData[i], i);
            }
            stopwatch.Stop();
            GC.Collect();
            long memAfter = GC.GetTotalMemory(true);
            long mem = memAfter - memBefore;

            Console.WriteLine(string.Format("Preset: {0}: {2} seconds", setDictionarySize, size, Math.Round((double)stopwatch.ElapsedMilliseconds / 1000d, 5)));
            Console.WriteLine(string.Format("{0} bytes memory, Ratio: {1}", mem, (double)mem / (double)size));


            if (size >= RandomPos.Length)
            {
                // Ensure GC won't trouble us
                GC.Collect();
                int maxLen = Math.Min(RandomPos.Length, size);
                stopwatch.Reset();
                stopwatch.Start();


                for (int i = 0; i < maxLen; i++)
                {
                    DummyToPreventOptimization = dictionary[TestData[RandomPos[i]]];
                }
                stopwatch.Stop();

                double avg = Math.Round(((double)stopwatch.ElapsedMilliseconds / (double)maxLen) / 1000f, 10);
                Console.WriteLine(string.Format("{0} random access: {1} seconds (avg of {2} seconds per lookup)", maxLen, Math.Round((double)stopwatch.ElapsedMilliseconds / 1000d, 5), avg));
            }
        }
    }

}
FacebookLinkedInTwitterOrkutDiggShare/Bookmark

Leave a Reply