C# File Read Benchmarks

Cache illustration

Filesystems are slow. This has led many developers to implement private caches for frequently-used files. ASP.NET uses in-memory output caching. Should you cache static files in memory in the C# programming language? You are wondering if you should you use caching everywhere.

This C# performance article tests the effects of caching files in memory. It reads files. It benchmarks caches.

Time required for file use #0

1 use:     327 ms (C# cache)
	   297 ms (no cache)
2 uses:    343 ms (C# cache)
	   577 ms (no cache)
3 uses:    374 ms (C# cache)
	   858 ms (no cache)
5 uses:    452 ms (C# cache)
	  1467 ms (no cache)
10 uses:   561 ms (C# cache)
	  2855 ms (no cache)

Time required for file use #1

1 use:    1372 ms (C# cache)
	   780 ms (no cache)
2 uses:   1872 ms (C# cache)
	  1544 ms (no cache)
3 uses:   2777 ms (C# cache)
	  2418 ms (no cache)
5 uses:   3557 ms (C# cache)
	  3900 ms (no cache)
10 uses:  4493 ms (C# cache)
	  7831 ms (no cache)

Time required for file use #2

1 use:    4602 ms (C# cache)
	  2325 ms (no cache)
2 uses:   4634 ms (C# cache)
	  4634 ms (no cache)
3 uses:   8222 ms (C# cache)
	  7425 ms (no cache)
5 uses:   9454 ms (C# cache)
	 11856 ms (no cache)
10 uses: 19781 ms (C# cache)
	 23712 ms (no cache)

Time required for file use #3

1 use:    9392 ms (C# cache)
	  4399 ms (no cache)
2 uses:  11576 ms (C# cache)
	  8689 ms (no cache)
3 uses:  14102 ms (C# cache)
	 13525 ms (no cache)
5 uses:  22058 ms (C# cache)
	 22402 ms (no cache)

File cache setup

I tested two classes in the C# code. The first one stores bytes that contain the contents of a file it reads in once. The second one simply asks Windows for the entire file each time. Both classes contain a property that returns the contents of the byte[] array. The first one copies from memory, but the second goes straight to the filesystem.

Copy Read Hits %. My interpretation of Windows' file cache is that it will "hit" the cache if a frequently-requested file is opened. My testing with the physical disk accesses showed that the physical disk wasn't used continually.

Copy reads

What the graph shows. The green line is the Copy Read Hits %, which should indicate how the Copy Interface is used in Windows. When the green line is at the top of the chart, it means that the file cache is hit. This occurred when my program was accessing files.

Benchmark details

I tested four text files. The smallest is 16 KB, and then 132 KB, 526 KB, 1005 KB. The files are read in as bytes, not strings. My code accessed these files in two ways. The first class stores the file data internally in the C# program.

Class that stores file data [C#]

/// <summary>
/// Uses C# file cache.
/// </summary>
class PerlFileA
{
    /// <summary>
    /// Cache of the bytes.
    /// </summary>
    byte[] _cache;

    /// <summary>
    /// Get bytes from cache.
    /// </summary>
    public byte[] Contents
    {
	get
	{
	    // Copy the cached bytes into an array and return.
	    int length = _cache.Length;
	    byte[] ret = new byte[length];
	    Array.Copy(_cache, ret, length);
	    return ret;
	}
    }

    /// <summary>
    /// Read in cache.
    /// </summary>
    public PerlFileA(string name)
    {
	_cache = File.ReadAllBytes(name);
    }
}

Next class. The second class simply does the dumb thing and asks Windows each time for the file data. This way should theoretically use the Windows file cache and also be fast.

Class that uses File.ReadAllBytes [C#]

/// <summary>
/// Doesn't use C# caching.
/// </summary>
class PerlFileB
{
    /// <summary>
    /// Get byte contents.
    /// </summary>
    public byte[] Contents
    {
	get
	{
	    // Return the file's bytes directly.
	    return File.ReadAllBytes(_name);
	}
    }

    /// <summary>
    /// Stores name of the file.
    /// </summary>
    string _name;

    /// <summary>
    /// Create new file non-cache.
    /// </summary>
    public PerlFileB(string name)
    {
	_name = name;
    }
}

Results

The benchmark results show that byte[] caching files in C# is useful for small files only. Caching files in C# is a premature optimization (meaning it yields no benefit or a slowdown) for larger files, and any file that is needed 3 or fewer times. See the figures at the top of the document for the benchmark results.

Benchmark

Here we look at some code that benchmarks the above class methods. This block of code is what I used to benchmark the file caching methods. It loops over an array of text files referenced in the foreach loop.

Program that benchmarks file reads [C#]

using System;
using System.IO;

class Program
{
    static void Main()
    {
	foreach (string name in new string[]
	    { "TextFile0.txt", "TextFile1.txt", "TextFile2.txt", "TextFile3.txt" })
	{
	    int m = 5000;
	    int x = 10;
	    long t1 = Environment.TickCount;
	    // Class A
	    for (int a = 0; a < m; a++)
	    {
		PerlFileA p1 = new PerlFileA(name);
		// How many times the cache is copied.
		for (int i = 0; i < x; i++)
		{
		    byte[] c = p1.Contents;
		}
	    }
	    long t2 = Environment.TickCount;
	    // Class B
	    for (int a = 0; a < m; a++)
	    {
		PerlFileB p2 = new PerlFileB(name);
		// How many times the file is opened.
		for (int i = 0; i < x; i++)
		{
		    byte[] c = p2.Contents;
		}
	    }
	    long t3 = Environment.TickCount;
	    Console.WriteLine((t2 - t1));
	    Console.WriteLine((t3 - t2));
	    Console.ReadLine();
	}
    }
}

File cache

Programming tip

The Windows operating system file cache works well here but there is some overhead for small files. Programmers such as myself could be misled by micro-benchmarks here, and end up hurting performance overall.

Kernels and caches. Windows' file cache doesn't have a lot of 'knobs' to turn, and it has been extensively tuned. It is also written in heavily optimized, low-level code. The overhead of C# greatly diminishes the benefits of custom caches.

ASP.NET output cache

ASP.NET web programming framework

My results would indicate that output caching web pages in ASP.NET is not useful unless there are expensive operations. For less CPU-intensive pages, it could be more efficient not to cache them at all, as Windows' file cache alone works better.

Output Cache Tutorial

Discussion

Note

So what is a developer to do? Basically my research here indicates that some custom caching code can slow down your app. If any file is only used 1-3 times, it is usually best not to cache it in C# at all. Custom caches can help with small files used often.

Windows is smart. Windows has been tuned to cache files when it is best for system health. This means that by providing a custom cache in your app, you may be working against system health.

Trying to be clever? I am guilty of this but my research here shows that in the greater picture of the system and IIS7 server, not caching aggressively with elaborate code is best. Windows' file cache works well.

Summary

The C# programming language

Here we looked at an experiment that tested the Windows Vista operating system's file caching layer. We saw that the file cache helped improve performance to a level where manually caching the files in the C# code was not always beneficial. The operating system uses caching at all levels to improve performance and duplicating these caches may not be useful.

File Handling
.NET