C# Compiler

Compiler

Programs are the most complicated engineering artifacts known. A compiler is a special type of program. It validates. It optimizes. It transforms. Compilers teach us how to solve complex programs.

Here:We review compiler theory. And we look at some compiler-related features of the C# language.

Errors

Compile-time errors are special. With a runtime error, your program may be causing trouble in the world. But with a compile-time error, the problem never progresses to that point. These errors improve program quality.

Compile-Time ErrorCompile-time error

Books

Books

A dragon is a formidable beast. It breathes fire and might even eat you. Compiler theory is so complex it is represented as a dragon. But we can fight this dragon with syntax directed translation.

Note:Quotes from Aho on this site are taken from the dragon book, Compilers: Principles, Techniques, and Tools.

Note 2:Quotes from Abelson and Sussman are taken from Structure and Interpretation of Computer Programs.

Note 3:Quotes from Lidin are taken from Expert .NET 2.0 IL Assembler. This book describes low-level details.

Intro

Squares: abstract

Compiler theory divides the compilation of programs into separate phases. At first, the program must be read from the text file. And then important characters are recognized as lexemes.

Lexeme:The term lexeme is used to refer to the textual representation of a token.

A token is a structure that combines a lexeme and information about that lexeme. After the tokens are determined, the compiler uses internal data structures (intermediate representations) to improve the form of programs.

Note:Lexical refers to the text representation of programs. Lexeme refers to the text representation of keywords and more.

And:Tokens combine lexemes and symbolic information about lexemes. The symbol table stores information about tokens.

Token

Phases

Steps

Let us walk through some compiler phases. These are used the C# compiler system and the .NET Framework. When you compile a C# program in Visual Studio, the csc.exe program is invoked on the program text.

Next:All the compilation units are combined in a preliminary step. The C# compiler proves errors in your program.

Definite assignment

Exclamation mark

The C# compiler uses definite assignment analysis. Here it proves that variables are not used before they are initialized. This step reduces the number of security problems and bugs in C# programs.

Tip:Definite assignment analysis ensures higher program quality because the programs are tested more at compile-time.

Definite Assignment

Overloads

Method call

The C# compiler applies inferential logic at compile-time. This has no penalty at execution. It finds the best overloaded method based on its parameters. The parameter types too are considered.

Overload

Tip:Overloaded methods can be used as a performance optimization. No runtime penalty is caused by using them.

Numbers

Cast to int

At the C# compilation stage, number transformations are applied. Numbers are "promoted" to larger representations to enable compilation with certain operators. And some casts not present in the program text are added.

Note:This is done to enable shorter and clearer high-level source code, and to ensure an accurate lower-level implementation.

Numeric Promotion

If

If keyword

The compiler uses node-based logic to rearrange conditional statements and loops, which both use jump instructions. Code often will be compiled to use branch instructions that do not reflect exactly the source text.

IfArrow indicates looping

For example, the C# compiler will change while-loops into the same thing as certain for-loops. It has sophisticated logic, presumably based on graph theory, to transform your loops and nested expressions into efficient representations.

Constants

Const keyword

In compiler theory, some levels of indirection can be eliminated by actually injecting the constant values into the representation of the program directly. This is termed constant folding.

Note:My benchmarks have shown that constant values do provide performance benefits over variables.

And:If you look at your compiled program, all constants will be directly inside the parts of methods where they were referenced.

Const

Strings

String

String literals are pooled together. And constant references to the stream of string data in the compiled program are placed where you used the literals. So the literals themselves are not located where you use them in methods.

Instead:The string literal in your program is transformed into a pointer to pooled data.

String Literal

Next:This program shows that the two string literals, declared separately, are actually the same string reference.

Program that shows string pool: C#

using System;

class Program
{
    static void Main()
    {
	string value = "Python";
	string value2 = "Python";
	// ... These are the same string!
	Console.WriteLine(string.ReferenceEquals(value, value2));
    }
}

Output

True

Metadata

Abstract

A C# program is compiled into a relational database (metadata). The metadata is an abstract binary representation. It is an efficient encoding of the program. But it is not easy to read by humans.

Also:The metadata is stored on the disk. It contains no comments from your source code.

The metadata is divided into tables. These tables contain records that point to different tables and different records. It is not typically important to study the metadata format unless you are writing a compiler.

Methods

Method, a computer program unit

Structural programming represents logic as procedure calls. It uses methods. In the metadata, method bodies omit the names of their local variables. This information is lost as compile-time. But parameter names are retained.

Note:The goal was to improve the level of optimization on method bodies and eliminate unneeded information, reducing disk usage.

Methods

Runtime

Framework: NET

A high-level C# program is translated into a relational database called metadata. The Common Language Runtime (CLR) executes this metadata. This incurs some overhead. Startup time is affected.

Then:As you run the program, each method is read from the metadata. Intermediate language code is translated into machine-level code.

Just-in-time compiler: JIT

In just-in-time compilation, the CLR applies many optimizations to the methods. It sometimes (based on heuristics) inserts the methods at their call sites. This optimization is called function inlining.

It rewrites instruction layouts in memory to eliminate unnecessary indirections. Each pointer dereference costs time. By removing this dereference, fewer instructions (and clock cycles) are needed.

Note:The JIT system causes a slowdown when first used. It is most beneficial on long-running programs.

.NET

Optimizations

Performance optimization

Compilers can apply many optimizations. But sometimes applying them manually is more effective. Code motion moves code outside of a loop. And induction variables are used to analyze data dependencies.

Code MotionInduction VariableJIT Compilation

Info:The best way to compile a program is "undecidable." So no program can truly be considered optimal.

OptimizationsOptimization Misnomer

Summary

Compilers are complicated. They use an elaborate series of phases to transform program source.
Modern computers,
and all computer software,
rely on compiler theory. It is at the core of all software.


C#: .NET