Tuesday, September 30, 2014

.NET Just in Time Compilation and Warming up Your System

IMG_7823.jpg

One of the primary obstacle we face while scaling our system is Just In Time (JIT) compilation of the .NET stack. We run a .NET managed application server at a huge scale with many thousands of managed assemblies on each server (and many many thousands of those servers distributed globally).We deploy code daily and since those are managed code they are JITed at each deployment. Our system is very sensitive to latency and we do not want those servers getting new code to cost us execution latency. So we use various mechanisms to warm the servers before they start serving queries. Common techniques we use are

  1. NGEN
  2. Force JITing (a specialized multi-core JIT technique see bottom of post)
  3. Sending warmer queries that warm up the system

In this effort I frequently handle questions regarding how the system JITs managed methods. Even after so many years of the CLR JIT existing there seems to be confusion around when JIT happens, what is the unit of compilation. So I thought I’d make a quick post on this topic.

Consider the following simple code I have in One.cs

using System;

namespace Foo
{
class MyClass
{
public void a()
{
int i = 0;
while(true)
{
b(i++);
}
}

public void b(int i)
{
Console.WriteLine(i);
}
}

class Program
{
static void Main()
{
MyClass mc = new MyClass();
mc.a();
}
}
}


Of interest is the function Foo.MyClass.a and Foo.MyClass.b. We will debug to find out exactly when the later is JITed.



First I compile and then launch the debugger. I will use the windbg debugger and the sos extensions extensively in this post. Also read https://support2.microsoft.com/kb/319037?wa=wsignin1.0 to see how to setup symbol servers to debug into the CLR.

csc /debug+ One.cs
windbg one.exe


After that in windbg I run the following command to setup

.sympath+ d:\MySymbols          ;$$ Use the Microsoft symbol server (see link above)
sxe ld:clr ;$$ break on CLR loaded
g ;$$ continue the program until you break on CLR.dll being loaded
.loadby sos clr ;$$ load the sos debugger extension

!bpmd One.exe Foo.MyClass.a ;$$ Set a managed break point in a()
g ;$$ continue until break point in a() is hit



When this break point is hit, we have obviously already JITed MyClass.a() and executing it. The question we now have is that whether all the functions a() calls like MyClass.b() already JITed. If not when/how will that be JITed. Lets debug it!!



**Color coding indicates how I take output of one command to give inputs to the next one.



First lets find the this pointer for the MyClass instance. This can be obtained from the current managed call stack

0:000> !clrstack -a
PARAMETERS:
this (0x0000000000d9eea0) = 0x0000000002922c58



The details of the this object shows the MethodTable for it. The MethodTable has pointer to EEClass (cold data).

0:000> !do 0x0000000002922c58
Name: Foo.MyClass
MethodTable: 00007ffab2f640d8
EEClass: 00007ffab3072340
Size: 24(0x18) bytes
File: d:\Skydrive\Code\C#\_JITPresentation\One.exe
Fields:


Now we can see more details of the MethodTable, which will show the individual methods descriptors.

0:000> !dumpmt -md 00007ffab2f640d8
EEClass: 00007ffab3072340
Module: 00007ffab2f62fc8
Name: Foo.MyClass
mdToken: 0000000002000002
File: d:\Skydrive\Code\C#\_JITPresentation\One.exe
BaseSize: 0x18
ComponentSize: 0x0
Slots in VTable: 7
Number of IFaces in IFaceMap: 0
--------------------------------------
MethodDesc Table
Entry MethodDesc JIT Name
00007ffb07c16300 00007ffb077c80e8 PreJIT System.Object.ToString()
00007ffb07c5e760 00007ffb077c80f0 PreJIT System.Object.Equals(System.Object)
00007ffb07c61ad0 00007ffb077c8118 PreJIT System.Object.GetHashCode()
00007ffb07c5eb50 00007ffb077c8130 PreJIT System.Object.Finalize()
00007ffab3080120 00007ffab2f640d0 JIT Foo.MyClass..ctor()
00007ffab3080170 00007ffab2f640b0 JIT Foo.MyClass.a()
00007ffab2f6c050 00007ffab2f640c0 NONE Foo.MyClass.b(Int32)


The type has 7 methods. Also the out clearly indicates that Foo.MyClass.a() is JITed and Foo.MyClass.b() is NONE (or not JITed). We can get more details about these methods

0:000> !dumpmd 00007ffab2f640b0
Method Name: Foo.MyClass.a()
Class: 00007ffab3072340
MethodTable: 00007ffab2f640d8
mdToken: 0000000006000001
Module: 00007ffab2f62fc8
IsJitted: yes
CodeAddr: 00007ffab3080170 <----- JITed
Transparency: Critical
i. 0:000> !dumpmd 00007ffab2f640c0
Method Name: Foo.MyClass.b(Int32)
Class: 00007ffab3072340
MethodTable: 00007ffab2f640d8
mdToken: 0000000006000002
Module: 00007ffab2f62fc8
IsJitted: no
CodeAddr: ffffffffffffffff <----- Not yet JITed


So at this point we know that a() is JITed but the method b() it calls is not. In that the question arises that if it is not what is the content of the native instructions for a() and what does that code call into for b(). The disassembly will clearly show that the entire method a() is JITed and that for outward managed calls there are calls to stubs

0:000> u 00007ffab3080170 L24
One!Foo.MyClass.a() [d:\Skydrive\Code\C#\_JITPresentation\One.cs @ 8]:
00007ffa`b3080170 48894c2408 mov qword ptr [rsp+8],rcx
00007ffa`b3080175 4883ec38 sub rsp,38h
00007ffa`b3080179 c744242000000000 mov dword ptr [rsp+20h],0
00007ffa`b3080181 c644242400 mov byte ptr [rsp+24h],0
00007ffa`b3080186 48b83834f6b2fa7f0000 mov rax,7FFAB2F63438h
00007ffa`b3080190 8b00 mov eax,dword ptr [rax]
00007ffa`b3080192 85c0 test eax,eax
00007ffa`b3080194 7405 je One!Foo.MyClass.a()+0x2b (00007ffa`b308019b)
00007ffa`b3080196 e82574b25f call clr!JIT_DbgIsJustMyCode (00007ffb`12ba75c0)
00007ffa`b308019b 90 nop
00007ffa`b308019c c744242000000000 mov dword ptr [rsp+20h],0
00007ffa`b30801a4 eb23 jmp One!Foo.MyClass.a()+0x59 (00007ffa`b30801c9)
00007ffa`b30801a6 90 nop
00007ffa`b30801a7 8b4c2420 mov ecx,dword ptr [rsp+20h]
00007ffa`b30801ab ffc1 inc ecx
00007ffa`b30801ad 8b442420 mov eax,dword ptr [rsp+20h]
00007ffa`b30801b1 89442428 mov dword ptr [rsp+28h],eax
00007ffa`b30801b5 894c2420 mov dword ptr [rsp+20h],ecx
00007ffa`b30801b9 8b542428 mov edx,dword ptr [rsp+28h]
00007ffa`b30801bd 488b4c2440 mov rcx,qword ptr [rsp+40h]
00007ffa`b30801c2 e889beeeff call Foo.MyClass.b(Int32) (00007ffa`b2f6c050)
00007ffa`b30801c7 90 nop
00007ffa`b30801c8 90 nop

00007ffa`b30801c9 c644242401 mov byte ptr [rsp+24h],1
00007ffa`b30801ce ebd6 jmp One!Foo.MyClass.a()+0x36 (00007ffa`b30801a6)
00007ffa`b30801d0 90 nop
00007ffa`b30801d1 4883c438 add rsp,38h
00007ffa`b30801d5 c3 ret
00007ffa`b30801d6 0000 add byte ptr [rax],al
00007ffa`b30801d8 1909 sbb dword ptr [rcx],ecx
00007ffa`b30801da 0100 add dword ptr [rax],eax
00007ffa`b30801dc 096200 or dword ptr [rdx],esp


So we see that for b a call is made to the memory location 00007ffa`b2f6c050. We can see what is there now by disassembling that address.

0:000> u 00007ffa`b2f6c050
Foo.MyClass.b(Int32):
00007ffa`b2f6c050 e87b5e755f call clr!PrecodeFixupThunk (00007ffb`126c1ed0)
00007ffa`b2f6c055 5e pop rsi
00007ffa`b2f6c056 0201 add al,byte ptr [rcx]


So basically instead of real native JITed code existing for b() there is actually a stub or thunk in it’s place. So we clearly establish that when a function is called it’s entire code is JITed and other method it calls is not yet JITed (however, there are caveats like inline methods etc). Now we can now go and set a breakpoint inside JIT to break when it tries it JIT the b() method. This is what we do

0:000> bp clr!UnsafeJitFunction ;$$ entry point for JITing a method
0:000> g ;$$ continue executing until we hit the UnsafeJITFunction
0:000> k ;$$ dump the stack for JITing
clr!UnsafeJitFunction
clr!MethodDesc::MakeJitWorker
clr!MethodDesc::DoPrestub
clr!PreStubWorker+0x3d6
clr!ThePreStub+0x5a [f:\dd\ndp\clr\src\vm\amd64\ThePreStubAMD64.asm @ 92]
One!Foo.MyClass.a()+0x57 [d:\Skydrive\Code\C#\_JITPresentation\One.cs @ 12]


As we can see that the JITing actually happened in the same call thread that is executing a() and exactly when b was called. ThePreStub finally calls the JITer. The JITer will actually JIT the method b() and backtrack the stack and patch up the call, so that it will actually now be a call straight to the JITed copy of b(). We hit g couple of times and now see what happens for the MethodDescriptor for b()

0:000> !dumpmd 00007ffab2f640c0
Method Name: Foo.MyClass.b(Int32)
Class: 00007ffab3072340
MethodTable: 00007ffab2f640d8
mdToken: 0000000006000002
Module: 00007ffab2f62fc8
IsJitted: yes
CodeAddr: 00007ffab30801f0 <-- Now it is JITed
Transparency: Critical


As we see b() is now JITed and we can see it’s disassembly as well. However, more interesting, lets go back and see what the disassembly of a() now contains

0:000> u 00007ffab3080170 L24
One!Foo.MyClass.a() [d:\Skydrive\Code\C#\_JITPresentation\One.cs @ 8]:
00007ffa`b3080170 48894c2408 mov qword ptr [rsp+8],rcx
00007ffa`b3080175 4883ec38 sub rsp,38h
00007ffa`b3080179 c744242000000000 mov dword ptr [rsp+20h],0
00007ffa`b3080181 c644242400 mov byte ptr [rsp+24h],0
00007ffa`b3080186 48b83834f6b2fa7f0000 mov rax,7FFAB2F63438h
00007ffa`b3080190 8b00 mov eax,dword ptr [rax]
00007ffa`b3080192 85c0 test eax,eax
00007ffa`b3080194 7405 je One!Foo.MyClass.a()+0x2b (00007ffa`b308019b)
00007ffa`b3080196 e82574b25f call clr!JIT_DbgIsJustMyCode (00007ffb`12ba75c0)
00007ffa`b308019b 90 nop
00007ffa`b308019c c744242000000000 mov dword ptr [rsp+20h],0
00007ffa`b30801a4 eb23 jmp One!Foo.MyClass.a()+0x59 (00007ffa`b30801c9)
00007ffa`b30801a6 90 nop
00007ffa`b30801a7 8b4c2420 mov ecx,dword ptr [rsp+20h]
00007ffa`b30801ab ffc1 inc ecx
00007ffa`b30801ad 8b442420 mov eax,dword ptr [rsp+20h]
00007ffa`b30801b1 89442428 mov dword ptr [rsp+28h],eax
00007ffa`b30801b5 894c2420 mov dword ptr [rsp+20h],ecx
00007ffa`b30801b9 8b542428 mov edx,dword ptr [rsp+28h]
00007ffa`b30801bd 488b4c2440 mov rcx,qword ptr [rsp+40h]
00007ffa`b30801c2 e889beeeff call Foo.MyClass.b(Int32) (00007ffa`b2f6c050)
00007ffa`b30801c7 90 nop
00007ffa`b30801c8 90 nop
00007ffa`b30801c9 c644242401 mov byte ptr [rsp+24h],1
00007ffa`b30801ce ebd6 jmp One!Foo.MyClass.a()+0x36 (00007ffa`b30801a6)
00007ffa`b30801d0 90 nop
00007ffa`b30801d1 4883c438 add rsp,38h
00007ffa`b30801d5 c3 ret
00007ffa`b30801d6 0000 add byte ptr [rax],al
00007ffa`b30801d8 1909 sbb dword ptr [rcx],ecx
00007ffa`b30801da 0100 add dword ptr [rax],eax
00007ffa`b30801dc 096200 or dword ptr [rdx],esp
00007ffa`b30801df 005600 add byte ptr [rsi],dl


Now if we re-disassemble the target of this call at 00007ffa`b2f6c050

0:000> u 00007ffa`b2f6c050
Foo.MyClass.b(Int32):
00007ffa`b2f6c050 e99b411100 jmp One!Foo.MyClass.b(Int32) (00007ffa`b30801f0)
00007ffa`b2f6c055 5f pop rdi
00007ffa`b2f6c056 0201 add al,byte ptr [rcx]


As you can see the address has been patched up and now b() is JITed and a() calls into b() without going through any stubs.



Obviously in this example I took a bunch of assumptions, but hopefully you now have the understanding to go debug your own scenarios and see what is at play. Some of the takeaways if you have JIT issues at startup



JIT happens at method granularity



Use modular code. Especially if you have error handling and other code which is rarely or almost never used, instead of having them in the main function, move them out. This will ensure that they are never JITed or at best not JITed at startup

void Foo()
{
try{
//...
}
catch(Exception ex)
{
ComplexErrorHandling(ex);
}
}


is better than

void Foo()
{
try{
//...
}
catch(Exception ex)
{
LogLocal();
UploadToSomeServer();
// MoreCode;
// EvenMoreCode;
}
}


For a given function running it once, JITs the whole function. However, do note that if it has difference code branches and each calls other functions you will need to execute all branches. In the case below Foo has to be called with both true and false to ensure downstream methods are JITed

void Foo(bool flag)
{
if(flag)
YesFlag();
else
NoFlag();
}


JITing happens in the same thread as the calls. The JIT engine does takes lock to ensure there is no races while JITing the same method from multiple threads.



Consider using NGEN, MultiCoreJIT or ForceJIT all methods you care about or even build your own mechanism based on the following code



Here’s some PseudoCode to force JIT which accomplishes that using RuntimeHelpers.PrepareMethod API (note this code does no error handling whatsoever). You can craft code around this to ForceJIT only assemblies and/or types in them that is causing JIT bottlenecks. Also this can be parallelized across cores. The .NET Multicore JIT is based on similar principle but automatically does it by generating a profile of what executes at your application startup and then JITing it for you in the next go.

using System;
using System.Reflection;
namespace ConsoleApplication5
{
class Program
{
static private void ForceJit(Assembly assembly)
{
var types = assembly.GetTypes();

foreach (Type type in types)
{
var ctors = type.GetConstructors(BindingFlags.NonPublic
| BindingFlags.Public
| BindingFlags.Instance
| BindingFlags.Static);

foreach (var ctor in ctors)
{
JitMethod(assembly, ctor);
}

var methods = type.GetMethods(BindingFlags.DeclaredOnly
| BindingFlags.NonPublic
| BindingFlags.Public
| BindingFlags.Instance
| BindingFlags.Static);

foreach (var method in methods)
{
JitMethod(assembly, method);
}
}
}

static private void JitMethod(Assembly assembly, MethodBase method)
{
if (method.IsAbstract || method.ContainsGenericParameters)
{
return;
}

System.Runtime.CompilerServices.RuntimeHelpers.PrepareMethod(method.MethodHandle);
}

static void Main(string[] args)
{
ForceJit(Assembly.LoadFile(@"d:\Scratch\asm.dll"));
}
}
}

No comments: