C#: .NET

.NET Array Collections File String Async Cast DataTable DateTime Dictionary Enum Exception For Foreach IEnumerable If IndexOf Lambda LINQ List Parse Path Process Property Regex Replace Sort Split StringBuilder Substring Switch Tuple

Note

Intermediate language. Code exists on many levels. Abstraction is a ladder. At the highest levels of abstraction, we have C# code. At a lower level, we have an intermediate language.


Go copyright

Instructions. The IL has instructions (opcodes) that manipulate the stack. These instructions, called intermediate language opcodes, reside in a relational data based called the metadata.

Compiler
Intermediate language: IL

Intro. Compiler theory has two parts.
In analysis,
source code is parsed
and a symbol table is derived. Synthesis is where the object code is generated and written to an executable.

Analysis:Source code is parsed and a symbol table is derived. Analysis comes before synthesis.

Synthesis:Object code is generated and written to an executable. Analysis and synthesis result in an intermediate form.

Info:On execution of the intermediate form, another compiler, the JIT compiler, will be invoked and perform even more optimizations.

Example program: C#

using System;

class Program
{
    static void Main()
    {
	int i = 0;
	while (i < 10)
	{
	    Console.WriteLine(i);
	    i++;
	}
    }
}

Intermediate language for Main method: IL

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 2
    .locals init (
	[0] int32 num)
    L_0000: ldc.i4.0
    L_0001: stloc.0
    L_0002: br.s L_000e
    L_0004: ldloc.0
    L_0005: call void [mscorlib]System.Console::WriteLine(int32)
    L_000a: ldloc.0
    L_000b: ldc.i4.1
    L_000c: add
    L_000d: stloc.0
    L_000e: ldloc.0
    L_000f: ldc.i4.s 10
    L_0011: blt.s L_0004
    L_0013: ret
}
Main method

IL for Main method. In the second part of the text, you can see the intermediate form of the Main method. This is generated from the C# code.


Const-keyword

Instructions. At L_0000, you can see that the constant number zero is pushed onto the evaluation stack through the instruction ldc.

Const
Squares: abstract

Branch. The instruction "br" indicates a branch. This is a conditional instruction that will change what instruction is executed next based on the condition.

Tip:In the C# language, things such as goto, while, for, break, and return may be implemented with variants of the branch instruction.

GotoWhileForBreakReturn
Call

Call method. Before Console.WriteLine is called, the location of the local variable we are writing is pushed to the evaluation stack. This is used as an argument.

Console.WriteLine
Steps

Increment. In IL, an increment is implemented by pushing a constant one to the evaluation stack, and then using the add arithmetic instruction to increment.


Loop

Loop instructions. The line L_0011 implements another instruction that returns control to the beginning of the loop if the condition is met. This is another branch.

Note:Branch instructions use labels in the intermediate language. Branches are basically goto statements at the lower level.


Programming tip

IL Disassembler. To gain an understanding of the C# language, I recommend that you inspect the intermediate language for all the code you write.

IL Disassembler

Tip:IL Disassembler is provided with Visual Studio. We use it to look at the intermediate language.


Not equal

Bne. If-statements are translated into branch instructions. One such intermediate language instruction, bne, stands for branch if not equal.

First:The important method here is called A(). If its argument is equal to 1, it writes something to the console.

Here:The constant 1 is pushed with ldc. And the bne instruction is used. It branches based on the constant.

Finally:If the two values on the top of the stack (the constant 1 and the argument int) are not equal, we jump forward to line IL_000f.

C# program that uses bne instruction

using System;

class Program
{
    static void Main()
    {
	A(0);
	A(1);
	A(2);
    }

    static void A(int value)
    {
	if (value == 1)
	{
	    Console.WriteLine("Hi");
	}
	else
	{
	    Console.WriteLine("Bye");
	}
    }
}

Output

Bye
Hi
Bye

A method: IL

.method private hidebysig static void  A(int32 'value') cil managed
{
  // Code size       26 (0x1a)
  .maxstack  8
  IL_0000:  ldarg.0
  IL_0001:  ldc.i4.1
  IL_0002:  bne.un.s   IL_000f
  IL_0004:  ldstr      "Hi"
  IL_0009:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_000e:  ret
  IL_000f:  ldstr      "Bye"
  IL_0014:  call       void [mscorlib]System.Console::WriteLine(string)
  IL_0019:  ret
} // end of method Program::A

Call. A call instruction invokes a static or Shared method. We learn about the call instruction in the .NET Framework's intermediate language.

Example:In the Main method, we create an instance of Program and call its instance method A. Finally we call the static method B.

C# program that uses instance and static methods

class Program
{
    void A()
    {
    }

    static void B()
    {
    }

    static void Main()
    {
	Program p = new Program();
	p.A();

	Program.B();
    }
}

IL

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       18 (0x12)
  .maxstack  1
  .locals init ([0] class Program p)
  IL_0000:  newobj     instance void Program::.ctor()
  IL_0005:  stloc.0
  IL_0006:  ldloc.0
  IL_0007:  callvirt   instance void Program::A()
  IL_000c:  call       void Program::B()
  IL_0011:  ret
} // end of method Program::Main

Callvirt. This is an intermediate language instruction. It is used on instance methods. The call instruction meanwhile is used on static methods.

Code that has an instance method: C#

class Test
{
    public int M()
    {
	return 0;
    }
}

class Program
{
    static void Main()
    {
	Test t = new Test();
	int i = t.M();
    }
}

IL of Main method

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 1
    .locals init (
	[0] class Test t)
    L_0000: newobj instance void Test::.ctor()
    L_0005: stloc.0
    L_0006: ldloc.0
    L_0007: callvirt instance int32 Test::M()
    L_000c: pop
    L_000d: ret
}

Code that has a static method: C#

class Test
{
    static public int M()
    {
	return 0;
    }
}

class Program
{
    static void Main()
    {
	int i = Test.M();
    }
}

IL of Main method

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 8
    L_0000: call int32 Test::M()
    L_0005: pop
    L_0006: ret
}

Callvirt benchmark. I was working on a program where performance is essential and decided to replace two instance method calls with static method calls.

Result:I found a 1.23% slowdown caused by the callvirt instruction being used.

Callvirt instruction performance data

Instance method and data: 44554 ms
Static method and data:   44008 ms [faster]
Note

Ldarg. This instruction is common. It places an argument onto the evaluation stack. There are several common variants of ldarg in the .NET Framework.

Note:In the IL section of the following code example, we see the ldarg.0 and ldarg.1 instructions.

Positions:The 0 and 1 refer to the position of the argument list (formal parameter list).

And:This means ldarg.0 pushes an int32 onto the evaluation stack, and ldarg.1 pushes a string onto the evaluation stack.

Method that has two parameters: C#

static int Something(int a, string b)
{
    a += 1;
    int y = b.Length + a;
    return y;
}

Intermediate Language: IL

.method private hidebysig static int32  Something(int32 a,
						  string b) cil managed
{
  // Code size       16 (0x10)
  .maxstack  2
  .locals init ([0] int32 y)
  IL_0000:  ldarg.0
  IL_0001:  ldc.i4.1
  IL_0002:  add
  IL_0003:  starg.s    a
  IL_0005:  ldarg.1
  IL_0006:  callvirt   instance int32 [mscorlib]System.String::get_Length()
  IL_000b:  ldarg.0
  IL_000c:  add
  IL_000d:  stloc.0
  IL_000e:  ldloc.0
  IL_000f:  ret
} // end of method Program::Something
Const

Ldc. With the ldc instruction, the .NET Framework loads constant values (such as integers) onto the evaluation stack, making many program constructs possible.

Tip:This program uses six constant values. We pass each constant to Console.WriteLine.

Tip 2:For the values 0 and 8, we find the instructions ldc.i4.0 and ldc.i4.8 are used.

Result:These load a 4-byte integer with value 0 and 8 to the top of the stack.

C# program that uses ldc instructions

using System;

class Program
{
    static void Main()
    {
	Console.WriteLine(0);
	Console.WriteLine(8);
	Console.WriteLine(9);
	Console.WriteLine(-1);
	Console.WriteLine('a');
	Console.WriteLine(1.23);
    }
}

IL

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       47 (0x2f)
  .maxstack  8
  IL_0000:  ldc.i4.0
  IL_0001:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_0006:  ldc.i4.8
  IL_0007:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_000c:  ldc.i4.s   9
  IL_000e:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_0013:  ldc.i4.m1
  IL_0014:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_0019:  ldc.i4.s   97
  IL_001b:  call       void [mscorlib]System.Console::WriteLine(char)
  IL_0020:  ldc.r8     1.23
  IL_0029:  call       void [mscorlib]System.Console::WriteLine(float64)
  IL_002e:  ret
} // end of method Program::Main
Framework: NET

Ldloc. How are local variables used in the intermediate language? In IL locals are allocated separately and accessed by indexes with the ldloc instruction.

Note:In the intermediate language, the ".locals" section indicates that four locals are allocated each time the method is called.

Info:The four local variables are indexed by the numbers 0, 1, 2 and 3. These indexes are used later in the IL.

Method that introduces local variables: C#

static int Example()
{
    int a = 1;
    string y = "cat";
    bool b = false;
    int result = a + y.Length + (b ? 1 : 0);
    return result;
}

Intermediate language: IL

.method private hidebysig static int32  Example() cil managed
{
  // Code size       29 (0x1d)
  .maxstack  3
  .locals init ([0] int32 a,
	   [1] string y,
	   [2] bool b,
	   [3] int32 result)
  IL_0000:  ldc.i4.1
  IL_0001:  stloc.0
  IL_0002:  ldstr      "cat"
  IL_0007:  stloc.1
  IL_0008:  ldc.i4.0
  IL_0009:  stloc.2
  IL_000a:  ldloc.0
  IL_000b:  ldloc.1
  IL_000c:  callvirt   instance int32 [mscorlib]System.String::get_Length()
  IL_0011:  add
  IL_0012:  ldloc.2
  IL_0013:  brtrue.s   IL_0018
  IL_0015:  ldc.i4.0
  IL_0016:  br.s       IL_0019
  IL_0018:  ldc.i4.1
  IL_0019:  add
  IL_001a:  stloc.3
  IL_001b:  ldloc.3
  IL_001c:  ret
} // end of method Program::Example
Abstract squares

Ldsfld. In addition to storing values in static fields, there exists an intermediate language instruction to load values from them.

Program:This program assigns the static field _a to 3. This is done with stsfld.

Next:The ldsfld pushes the value stored in the location of _a onto the top of the evaluation stack.

Then:We load the constant 4 onto the top of the evaluation stack with the ldc.i4.4 instruction.

C# program that has ldsfld instruction

using System;

class Program
{
    static int _a;
    static void Main()
    {
	Program._a = 3;
	if (Program._a == 4)
	{
	    Console.WriteLine(true);
	}
    }
}

IL

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       21 (0x15)
  .maxstack  8
  IL_0000:  ldc.i4.3
  IL_0001:  stsfld     int32 Program::_a
  IL_0006:  ldsfld     int32 Program::_a
  IL_000b:  ldc.i4.4
  IL_000c:  bne.un.s   IL_0014
  IL_000e:  ldc.i4.1
  IL_000f:  call       void [mscorlib]System.Console::WriteLine(bool)
  IL_0014:  ret
} // end of method Program::Main
Array

Newarr. The newarr instruction creates a vector. This is a one-dimensional array. Vectors have special support in the .NET Framework.

One-dimensional arrays that aren't zero-based, and multi-dimensional arrays, are created using newobj rather than newarr.

The CLI Annotated Standard

Next:This program creates an int array of ten elements. The first element is initialized so the compiler does not eliminate any code.

IL:The constant 10 is loaded onto the evaluation stack before newarr is encountered. This causes the array to have ten elements.

C# program that creates new array

using System;

class Program
{
    static void Main()
    {
	int[] array = new int[10];
	array[0] = 1;
    }
}

Intermediate representation: IL

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       13 (0xd)
  .maxstack  3
  .locals init ([0] int32[] 'array')
  IL_0000:  ldc.i4.s   10
  IL_0002:  newarr     [mscorlib]System.Int32
  IL_0007:  stloc.0
  IL_0008:  ldloc.0
  IL_0009:  ldc.i4.0
  IL_000a:  ldc.i4.1
  IL_000b:  stelem.i4
  IL_000c:  ret
} // end of method Program::Main

Ret. Every method executed has a ret instruction. This returns control from the method to the calling method. The instruction sometimes returns a value.

Next:Let's look at a simple C# method that returns an integer value. This method adds two to the argument.

Tip:You can see the argument is accessed with ldarg.0. Finally, the ret instruction appears at the end of the method.

Stack:The evaluation stack at the time ret is executed has the result of the add instruction on it. This is returned.

Method that returns integer: C#

static int Method(int value)
{
    return 2 + value;
}

Intermediate language

.method private hidebysig static int32  Method(int32 'value') cil managed
{
  // Code size       4 (0x4)
  .maxstack  8
  IL_0000:  ldc.i4.2
  IL_0001:  ldarg.0
  IL_0002:  add
  IL_0003:  ret
} // end of method Program::Method
Copyright

Stelem. This stores a value inside an array element. Its performance depends on the type of the element and array. We test stelem in some C# programs.

Newarr:When you assign elements in an array, the array itself is pushed onto the stack (newarr).

Next:The index you are using in the array is pushed onto the stack (ldc), and the value you want to put into the array is pushed (ldloc).

Program 1 in C#

class Program
{
    static void Main()
    {
	char[] c = new char[100];
	c[0] = 'f';
    }
}

Program 1 in IL

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 3
    .locals init (
	[0] char[] c)
    L_0000: ldc.i4.s 100
    L_0002: newarr char
    L_0007: stloc.0
    L_0008: ldloc.0
    L_0009: ldc.i4.0
    L_000a: ldc.i4.s 0x66
    L_000c: stelem.i2
    L_000d: ret
}

Program 2 in C#

class Program
{
    static void Main()
    {
	int[] i = new int[100];
	i[0] = 5;
    }
}

Program 2 in IL

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 3
    .locals init (
	[0] int32[] i)
    L_0000: ldc.i4.s 100
    L_0002: newarr int32
    L_0007: stloc.0
    L_0008: ldloc.0
    L_0009: ldc.i4.0
    L_000a: ldc.i4.5
    L_000b: stelem.i4
    L_000c: ret
}

Program 3 in C#

class Program
{
    static void Main()
    {
	uint[] u = new uint[100];
	u[0] = 2;
    }
}

Program 3 in IL

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 3
    .locals init (
	[0] uint32[] u)
    L_0000: ldc.i4.s 100
    L_0002: newarr uint32
    L_0007: stloc.0
    L_0008: ldloc.0
    L_0009: ldc.i4.0
    L_000a: ldc.i4.s 0x66
    L_000c: stelem.i4
    L_000d: ret
}
Performance fast-forward

Stelem benchmark. Could certain value type elements be faster than others? My previous research in C indicated that the unsigned integer data type is most efficient.

Result:Uint32 was the fastest element type to set in an array. Char was slowest of the three tests.

Benchmark of setting array elements
    100 million iterations repeated 100 times.

  char array: 5208 ms
 int32 array: 4321 ms
uint32 array: 4125 ms
Static

Stsfld. What is the stsfld instruction? This stores the value on top of the evaluation stack into a static field. We demonstrate this instruction.

However:In the intermediate language, we load a constant value 3 (ldc.i4.3) onto the top of the evaluation stack.

Then:We have a stsfld with the argument being the address of _a. So the top value (3) is stored into the location pointed to by _a.

C# program that has stsfld instruction

using System;

class Program
{
    static int _a;
    static void Main()
    {
	Program._a = 3;
	if (Program._a == 4)
	{
	    Console.WriteLine(true);
	}
    }
}

IL

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       21 (0x15)
  .maxstack  8
  IL_0000:  ldc.i4.3
  IL_0001:  stsfld     int32 Program::_a
  IL_0006:  ldsfld     int32 Program::_a
  IL_000b:  ldc.i4.4
  IL_000c:  bne.un.s   IL_0014
  IL_000e:  ldc.i4.1
  IL_000f:  call       void [mscorlib]System.Console::WriteLine(bool)
  IL_0014:  ret
} // end of method Program::Main
Switch

Switch. The switch instruction is implemented with a jump table. The switch keyword—in the C# language—is typically compiled to a switch instruction.

IL:In the intermediate language, we see the instruction switch (IL_0020, IL_0027, IL_002e).

Note:Those three identifiers point to lines, which are underlined. So the switch jumps to locations.

C# that uses switch statement

using System;

class Program
{
    static void Main()
    {
	int i = int.Parse("1");
	switch (i)
	{
	    case 0:
		Console.WriteLine(0);
		break;
	    case 1:
		Console.WriteLine(1);
		break;
	    case 2:
		Console.WriteLine(2);
		break;
	}
    }
}

Intermediate language

.method private hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       53 (0x35)
  .maxstack  1
  .locals init ([0] int32 i,
	   [1] int32 CS$0$0000)
  IL_0000:  ldstr      "1"
  IL_0005:  call       int32 [mscorlib]System.Int32::Parse(string)
  IL_000a:  stloc.0
  IL_000b:  ldloc.0
  IL_000c:  stloc.1
  IL_000d:  ldloc.1
  IL_000e:  switch     (
			IL_0020,
			IL_0027,
			IL_002e)
  IL_001f:  ret
  IL_0020:  ldc.i4.0
  IL_0021:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_0026:  ret
  IL_0027:  ldc.i4.1
  IL_0028:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_002d:  ret
  IL_002e:  ldc.i4.2
  IL_002f:  call       void [mscorlib]System.Console::WriteLine(int32)
  IL_0034:  ret
} // end of method Program::Main

A summary. The intermediate language is not needed every day. It usually goes without notice. But it opens up a new way of understanding what C# code does.