C# ASCII Strings

String type

A string has two bytes representing each character. If the strings are only ASCII, you can change them to be stored as single bytes. This reduces the memory usage by one byte per letter. We change string representations to be smaller.

Example

Performance optimization

The concept behind this benchmark is simple. It allocates an array of 10,000 strings. The memory this requires is measured. Then another method (Compress) changes each string into a byte array. And the memory of this array is measured.

Byte Array
Program that changes string representation: C#

using System;
using System.IO;
using System.Text;

class Program
{
    static void Main()
    {
	long a = GC.GetTotalMemory(true);
	string[] array = Get();
	long b = GC.GetTotalMemory(true);

	array[0] = null;

	long c = GC.GetTotalMemory(true);
	byte[][] array2 = Compress(Get());
	long d = GC.GetTotalMemory(true);

	array2[0] = null;

	Console.WriteLine(a);
	Console.WriteLine(b);

	Console.WriteLine(c);
	Console.WriteLine(d);
    }

    static string[] Get()
    {
	string[] output = new string[10000];
	for (int i = 0; i < 10000; i++)
	{
	    output[i] = Path.GetRandomFileName();
	}
	return output;
    }

    static byte[][] Compress(string[] array)
    {
	byte[][] output = new byte[array.Length][];
	for (int i = 0; i < array.Length; i++)
	{
	    output[i] = ASCIIEncoding.ASCII.GetBytes(array[i]);
	}
	return output;
    }
}

Output

39128
479800
39784
320056
Array type

In this program, the string[] required about 480,000 bytes. The byte[][] (a jagged array of byte arrays) required 320,000 bytes. There was no data loss in these strings because the strings were ASCII-only.

GC.GetTotalMemoryJagged ArraysConvert String, Byte Array

Converting back to strings. You can convert the byte arrays back into strings by calling ASCIIEncoding.ASCII.GetString. Please note this will have a performance and memory cost to create new strings.

Discussion

Question and answer

Is this useful? Probably not. However, if you have a program that stores a huge number of ASCII strings that are rarely needed, but must be stored in memory, this could be a useful optimization.

However:There is an additional cost when you need to convert back into strings.

Summary

We looked at an optimization that can compress ASCII strings to use only one byte per character instead of two bytes. In some cases, this alternate representation could save a significant amount of memory.


C#: Compression