Home
C#
Syntax Highlighter for Python
Updated Aug 24, 2025
Dot Net Perls

Syntax highlighter

Recently I became interested in syntax highlighting, and discovered that many examples use regular expressions. Unfortunately the Regex approach is slow and hard to maintain.

Instead, a Python syntax highlighter written in C# can be done with a simple tokenizer that searches through an array (or Dictionary). Char type methods, like char.IsDigit, can also be used.

Example

Here is a class called PythonSyntaxHighlighter that includes a static array containing keywords. It uses color codes that will render in the terminal.

Info In HighlightSyntaxPython we iterate over every character in the Python source code.
Next We test the character and the previous character. If we have a letter, for example, we see if we are on a keyword.
And If we successfully match a keyword, we call Append() on our StringBuilder with the color codes necessary to render the keyword as blue.
using System;
using System.IO;
using System.Text;

class PythonSyntaxHighlighter
{
    // Support any number of keywords.
    static string[] Keywords = ["if", "else", "class", "def"];

    // Use colors based on type of data.
    const string _keywordStart = "\u001b[34m"; // Blue
    const string _stringStart = "\u001b[32m"; // Green
    const string _commentStart = "\u001b[33m"; // Yellow
    const string _numberStart = "\u001b[31m"; // Red
    const string _reset = "\u001b[0m";

    static string HighlightPythonSyntax(string pythonCode)
    {
        // Build up the formatted code.
        var builder = new StringBuilder(pythonCode.Length * 2);
        var temp = new StringBuilder();
        for (int i = 0; i < pythonCode.Length; i++) 
        {
            var previousByte = ' ';
            if (i >= 1)
            {
                previousByte = pythonCode[i - 1];
            }
            var byteHere = pythonCode[i];
            if (char.IsWhiteSpace(previousByte) && char.IsLetter(byteHere))
            {
                // Handle keywords.
                temp.Clear();
                for ( ; i < pythonCode.Length; i++)
                {
                    if (!char.IsLetter(pythonCode[i]))
                    {
                        break;
                    }
                    temp.Append(pythonCode[i]);
                }
                var keywordHere = temp.ToString();
                if (Keywords.Contains(keywordHere))
                {
                    builder.Append(_keywordStart);
                    builder.Append(keywordHere);
                    builder.Append(_reset);
                }
                else
                {
                    builder.Append(keywordHere);
                }
            }
            else if (byteHere == '"')
            {
                // Handle string literals.
                temp.Clear();
                temp.Append('"');
                i += 1;
                for ( ; i < pythonCode.Length; i++)
                {
                    if (pythonCode[i] == '"')
                    {
                        break;
                    }
                    temp.Append(pythonCode[i]);
                }
                builder.Append(_stringStart);
                builder.Append(temp);
                builder.Append(_reset);
            }
            else if (byteHere == '#' && char.IsWhiteSpace(previousByte))
            {
                // Handle comments.
                temp.Clear();
                for ( ; i < pythonCode.Length; i++)
                {
                    if (pythonCode[i] == '\n')
                    {
                        break;
                    }
                    temp.Append(pythonCode[i]);
                }
                builder.Append(_commentStart);
                builder.Append(temp);
                builder.Append(_reset);
            }
            else if (char.IsDigit(byteHere))
            {
                // Handle numbers.
                temp.Clear();
                for ( ; i < pythonCode.Length; i++)
                {
                    if (!char.IsDigit(pythonCode[i]) && pythonCode[i] != '.')
                    {
                        break;
                    }
                    temp.Append(pythonCode[i]);
                }
                builder.Append(_numberStart);
                builder.Append(temp);
                builder.Append(_reset);
            }
            // Append the following byte.
            builder.Append(pythonCode[i]);
        }
        return builder.ToString();
    }

    static void Main()
    {
        string pythonCode = File.ReadAllText("program.py");
        string highlightedCode = HighlightPythonSyntax(pythonCode);
        Console.WriteLine(highlightedCode);
    }
}
def my_function(x): if x > 2.5: print("x is greater than 2.5") else: print("x is not greater than 2.5") # This is a comment y = 10 + 5 class MyClass: def __init__(self, name): self.name = name

Notes, continued

For string literals, numbers, and comments, we perform similar logic as for keywords. We determine how long our token is, and then render it with surrounding color codes.

Important We avoid slow Regex operations, and overall our method will perform much faster than one that uses many Regex tests.

While it is possible to manipulate text with Regex calls, it can end up being slow and hard-to-maintain. A simple loop that tests each char as it proceeds is a better long-term choice.

Dot Net Perls is a collection of pages with code examples, which are updated to stay current. Programming is an art, and it can be learned from examples.
Donate to this site to help offset the costs of running the server. Sites like this will cease to exist if there is no financial support for them.
Sam Allen is passionate about computer languages, and he maintains 100% of the material available on this website. He hopes it makes the world a nicer place.
This page was last updated on Aug 24, 2025 (new).
Home
Changes
© 2007-2025 Sam Allen