C# Mask Optimization

Use a bit mask to optimize checking for duplicate lowercase letters.

Mask optimization. An int has 32 bits. Each bit can be set separately. Because of this, it can be used as a simple array of boolean values. In certain string processing tasks, this mask optimization yields good results.
Note: With this optimization, we can avoid handling duplicate letters by keeping track of ones already encountered.

Example. The goal of this program is to detect duplicate letters in a string of lowercase chars. To do this, it uses a mask integer. The lowercase letters are converted to integers 0 to 25 by subtracting 97 from their ASCII representation.
Then: As each letter is encountered, its bit is tested. If it was already encountered, its bit will be set to 1.
Otherwise: We record the letter in the mask. The next time it is encountered we will recognize it.
C# program that checks duplicate letters using System; class Program { static void Main() { Console.WriteLine("perls"); ReportDuplicateCharacters("perls"); Console.WriteLine("massive"); ReportDuplicateCharacters("massive"); Console.WriteLine("mississippi"); ReportDuplicateCharacters("mississippi"); } static void ReportDuplicateCharacters(string value) { int mask = 0; for (int i = 0; i < value.Length; i++) { int index = value[i] - 97; if ((mask & (1 << index)) != 0) { Console.WriteLine("Duplicate: {0}", value[i]); } else { mask |= (1 << index); } } // To zero a bit: mask &= ~(1 << index); } } Output perls massive Duplicate: s mississippi Duplicate: s Duplicate: i Duplicate: s Duplicate: s Duplicate: i Duplicate: p Duplicate: i

Discussion. My algorithm only requires one pass through the string. In testing I found that an array of bools is about as fast. It is possible to re-use the memory of an array of bools and get similar performance to this bitmask approach.
But: The mask approach here is more memory-efficient should you need to store this data somehow.

Summary. This style of bit-testing algorithm has a limited range of use in programming. Sometimes it is effective. Usually it is not worthwhile. I have rewritten this particular algorithm repeatedly—which means it's probably worth documenting.

© 2007-2020 Sam Allen. Send bug reports to info@dotnetperls.com.