Bong Geek - Abhinaba Basu: 07/01/2008

Sunday, July 20, 2008

String equality

akutz has one of the most detailed post on string interning and equality comparison performance metrics I have ever seen. Head over to the post here

I loved his conclusion which is the crux of the whole story.

"In conclusion, the String class’s static Equals method is the most efficient way to compare two string literals and the String class’s instance Equals method is the most efficient way to compare two runtime strings. The kicker is that there must be 10^5 (100,000) string comparisons taking place before one method starts becoming more efficient than the next. Until then, use whichever method floats your boat."

Kismat Konnection

I had a real hard time giving a title to this post.

The post was originally supposed to be about the fire I saw yesterday in a movie theatre Talkie Town (map). However it is not really about the fire, it is more about the people and how they reacted to it.

First things first, the fire.

Folks who know me, know that I go to a movie theatre once in couple of years, I prefer movies visiting me rather than the other way around. In the 4 years I'm in Hyderabad, this was my 3rd visit to a movie. So I can say it was a momentous occasion when me and my wife impromptu decided to see a movie in the 2 screen theatre in Miyapur called Talkie Town. The decision was heavily biased on the fact that my father-in-law was at home looking after our daughter.

Batman (or rather The Dark Knight) lost to my wife and I was forced to see this bollywood flick Kismat Konnection.

This is where all the fun started. Just after the interval (1.5 hours through, and yes bollywood movies are that long) people saw some light on the theatre ceiling. Soon the light spread and there was a small hole in the ceiling and we could actually see flames. Whatever was above the sound proofed ceiling had caught fire and the heat actually caused the material of the ceiling to burn.

Then the most amazing thing happened, 50% people didn't care. They were gleefully looking up and not even caring to leave the theatre. I'm not talking about some false fire alarm, or smell of smoke, it was a real fire with flames and a burned hole in the ceiling!! The fire alarm didn't sound and no one from the theatre authority seemed to be around. I left and called some folks and they simply went to the terrace. All the noise now prompted some more people to leave but even at that time there were others happily watching as if the whole thing was a part of the movie.

Then even more amazing thing happened, the folks from the theatre came and said that the fire has been doused and re-started the show. The hall was smoky and the AC was off. Me and another guy (yea there were some more sane people around) caught hold of an official and he simply said he has checked and he can guarantee that everything is safe. I asked him that since he couldn't ensure that there is no fire in the first place how can he guarantee against a recurrence, he simply said I can get a refund of my ticket. Which I did and left the hall.

On the way back I had a revelation. From childhood I had seen that people around me had little care about safety in general, but this is the first time I figured out that people didn't even care about their own safety. How can a movie be worth taking the risk of sitting in a fire hazard zone and that too with small children on their lap is something I will never figure out.

A lot of things including the traffic situation and the weird jay walking I see around suddenly makes more sense to me.

Wednesday, July 09, 2008

Writing exception handlers as separate methods may prove to be a good idea

Let us consider a scenario where you catch some exception and in the exception handler do some costly operation. You can write that code in either of the following ways

Method-1 : Separate method call

public class Program
{
    public static void Main(string[] args)
    {
        try
        {
            using (DataStore ds = new DataStore())
            {
                // ...
            }
        }
        catch (Exception ex)
        {
            ErrorReporter(ex);
        }
    }

    private static void ErrorReporter(Exception ex)
    {
        string path = System.IO.Path.GetTempFileName();
        ErrorDumper ed = new ErrorDumper(path, ex);
        ed.WriteError();

        XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.Load(path);
        RemoteErrorReporter er = new RemoteErrorReporter(xmlDoc);
        er.ReportError();
    }
}

Method-2 : Inline

public static void Main(string[] args)
{
    try
    {
        using (DataStore ds = new DataStore())
        {
            // ...
        }
    }
    catch (Exception ex)
    {
        string path = System.IO.Path.GetTempFileName();
        ErrorDumper ed = new ErrorDumper(path, ex);
        ed.WriteError();

        XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.Load(path);
        RemoteErrorReporter er = new RemoteErrorReporter(xmlDoc);
        er.ReportError();
    }
}

The simple difference is that in the first case the exception handler is written as a separate method and in the second case it is placed directly inline inside the handler itself.

The question is which is better in terms of performance?

In case you do have significant code and type reference in the handler and you expect the exception to be thrown rarely in an application execution then the Method-1 is going to be more performant.

The reason is that just before executing a method the whole method gets jitted. The jitted code contains stubs to the other method's it will call but it doesn't do a recursive jitting. This means when Main gets called it gets jitted but the method ErrorReporter is still not jitted. So in case the exception is never fired all the code inside ErrorReporter never gets Jitted. This might prove to be significant saving in terms of time and space if the handling code is complex and refers to type not already referenced.

However, if the code is inline then the moment Main gets jitted all the code inside the catch block gets jitted. This is expensive not only because it leads to Jitting of code that is never executed but also because all types referenced in the catch block is also resolved resulting in loading a bunch of dlls after searching though the disk. In our example above System.Xml.dll and the other dll containing remote error reporting gets loaded even though they will never be used. Since disk access, loading assemblies and type resolution are slow, the simple change can prove to give some saving.

Tuesday, July 08, 2008

Microsoft RoundTable

Our conference rooms have been fitted with this really weird looking device (click to enlarge).

I had no clue what the thing was. Fortunately it's box was still placed in the room along with the manual. It's called the Microsoft RoundTable and it is actually a 360-degree camera (with 5 cameras and 6 microphones). It comes with bundled software that let's all participant be visible to the other side in a live meeting at real time. It shows the white board and the software is intelligent enough to focus on and track the active speaker (using microphone and face recognition) and much much more (lot of MS Research stuff has gone into it). The video below gives you some idea and head on to this post for some review and inside view of the device.

Simply put it's AWSOME

Monday, July 07, 2008

Do namespace using directives affect Assembly Loading?

The simple answer is no, the inquisitive reader can read on :)

Close to 2 year back I had posted about the two styles of coding using directives as follows

Style 1

namespace MyNameSpace
{
    using System;
    using System.Collections.Generic;
    using System.Text;
    // ...
}

Style 2

using System;
using System.Collections.Generic;
using System.Text;

namespace MyNameSpace
{
    // ...
}

and outlined the benefits of the first style (using directives inside the namespace). This post is not to re-iterate them.

This post to figure out if either of the styles have any bearing in the loading order of assemblies. Obviously at the first look it clearly indicates that is shouldn't, but this has caused some back and forth discussions over the web.

Scot Hanselman posted about a statement on the Microsoft Style cop blog which states

"When using directives are declared outside of a namespace, the .Net Framework will load all assemblies referenced by these using statements at the same time that the referencing assembly is loaded.

However, placing the using statements within a namespace element allows the framework to lazy load the referenced assemblies at runtime. In some cases, if the referencing code is not actually executed, the framework can avoid having to load one or more of the referenced assemblies completely. This follows general best practice rule about lazy loading for performance.

Note, this is subject to change as the .Net Framework evolves, and there are subtle differences between the various versions of the framework."

This just doesn't sound right because using directives have no bearing to assembly loading.

Hanselman did a simple experiment with the following code

using System;  
using System.Xml;  
  
namespace Microsoft.Sample  
{  
   public class Program  
   {  
      public static void Main(string[] args)  
      {  
         Guid g = Guid.NewGuid();  
         Console.WriteLine("Before XML usage");  
         Console.ReadLine();  
         Foo();  
         Console.WriteLine("After XML usage");  
         Console.ReadLine();  
      }  
  
      public static void Foo()  
      {  
         XmlDocument x = new XmlDocument();  
      }  
   }  
}

and then he watched the loading time using process explorer and then he moved the using inside the namespace and did the same. Both loaded the System.Xml.dll after he hit enter on the console clearly indicating that for both the cases they got lazy loaded.

Let me try to give a step by step rundown of how the whole type look up of XmlDocument happens in .NETCF which in turn would throw light on whether using directives have bearing on assembly loading.

When Main method is Jitted and ran the System.Xml.dll is not yet loaded
When method Foo is called the execution engine (referred to as EE) tries to JIT the method. As documented the Jitter only JITs methods that are to be executed.
The Jitter tries to see if the method Foo is managed (could be native as well due to mixed mode support) and then tries to see if it's already Jitted (by a previous call), since it's not it goes ahead with jitting it
The jitter validates a bunch of stuff like whether the class on which the method Foo is being called (in this case Microsoft.Sample.Program) is valid, been initialized, stack requirements, etc...
Then it tries to resolve the local variables of the method. It waits to resolve the local variable type reference till this point so that it is able to save time and memory by not Jitting/loading types that are referenced by methods that are never executed
Then it tries to resolve the type of the variable which in this case if System.Xml.XmlDocument.
It sees if it's already in the cache, that is if that type is already loaded
Since it's not the case it tries to search for the reference based on the type reference information
This information contains the full type reference including the assembly name, which in this case is System.Xml.dll and also version information,strong name information, etc...
All of the above information along with other information like the executing application's path is passed to the assembly loader to load the assembly
The usual assembly search sequence is used to look for the assembly and then it is loaded and the type reference subsequently gets resolved

If you see the above steps there is in no way a dependency of assembly loading on using directive. Hence at least on .NETCF whether you put the using outside or inside the namespace you'd get the referenced assemblies loaded exactly at the time of first reference of a type from that assembly (the step #5 above is the key).

Sunday, July 06, 2008

Auto generating Code Review Email for TFS

We use a small command line tool called crmail to auto-generate code review email from shelveset. I find the whole process very helpful and thought I'd share the process and the tool (which has some really cool features).

Features

Automatic generation of the email from the shelveset details
Hyperlinks are put to TFS webaccess so that you can review code from machines without any tools installed, even without source enlistment. Yes it's true!!! The only thing you need is your office's intranet access
You can even use a Windows mobile phone :) and even some non MS browsers. Ok I guess I have sold this enuf
This is how the email looks like with all the details pointed out
Effectively you can see the file diff, history, blame (annotate), shelveset details, associated bugs, everything from your browser and best thing is that all of these takes one click each.
This is how the fill diff looks in the browser

Pre-reqs

Team System Web Access (TSWA) 2008 power tool installed on your TFS server
Outlook installed on the machine on which the email is generated
Enlistment and TFS client installed on the machine on which the email is generated
For reviewers there is no pre-req other than a browser and email reader.

Dev process

The developer creates a shelveset after he is done with his changes. He ensures he fills up all the details including the reviewers email address ; separated
He runs the tool with a simple command
crmail shelvesetname
Email gets generated and opened he fills in additional information and fires send
Done!!

Reviewers

Ok they just click on the email links. Since mostly these are managers what more do you expect out of them? Real devs will stick with firing up tfpt command line :)

Configuring the tool

Download the binaries from here
Unzip. Open the crmail.exe.config file and modify the values in it to point to your tfsserver and your code review distribution list (if you do not have one then make it empty)
Checkin to some tools folder in your source control so that everyone in your team has access to it

Support

Self help is the best help :), download the sources from here and enjoy. Buck Hodges post on the direct link URLs would help you in case you want to modify the sources to do more.

Thursday, July 03, 2008

How does the .NET CF handle null reference

What happens when we have code as bellow

class B
{
    public virtual void Virt(){
        Console.WriteLine("Base::Virt");
    }
}

class Program
{
    static void Main(string[] args){
        B b = null;
        b.Virt(); // throws System.NullReferenceException
    }
}

Obviously we have a null reference exception being thrown. If you see the IL the call looks like

    L_0000: nop 
    L_0001: ldnull 
    L_0002: stloc.0 
    L_0003: ldloc.0 
    L_0004: callvirt instance void ConsoleApplication1.B::Virt()
    L_0009: nop 
    L_000a: ret

So in effect you'd expect the jitter to generate the following kind of code (in processor instruction)

if (b == null)
   throw new NullReferenceException
else
   b->Virt() // actually call safely using the this pointer

However, generating null checks for every call is going to lead to code bloat. So to work around this on some platforms (e.g. .NETCF on WinCE 6.0 and above) it uses the following approach

Hook up native access violation exception (WinCE 6.0 supports this) to a method in the execution engine (EE)
Do not generate any null checking and directly generate calls through references
In case the reference is null then a native AV (access violation is raised as invalid 0 address is accessed) and the hook method is called
At this point the EE checks to see if the source of the access violation (native code) is inside Jitted code block. If yes it creates the managed NullRefenceException and propagates it up the call chain.
If it's outside then obviously it's either CLR itself or some other native component is crashing and it has nothing to do about it..

Wednesday, July 02, 2008

C# generates virtual calls to non-virtual methods as well

Sometime back I had posted about a case where non-virtual calls are used for virtual methods and promised posting about the reverse scenario. This issue of C# generating callvirt IL instruction even for non-virtual method calls keeps coming back on C# discussion DLs every couple of months. So here it goes :)

Consider the following code

class B
{
    public virtual void Virt(){
        Console.WriteLine("Base::Virt");
    }

    public void Stat(){
        Console.WriteLine("Base::Stat");
    }
}

class D : B
{
    public override void Virt(){
        Console.WriteLine("Derived::Virt");
    }
}

class Program
{
    static void Main(string[] args)
    {
        D d = new D();
        d.Stat(); // should emit the call IL instruction
        d.Virt(); // should emit the callvirt IL instruction
    }
}

The basic scenario is that a base class defines a virtual method and a non-virtual method. A call is made to base using a derived class pointer. The expectation is that the call to the virtual method (B.Virt) will be through the intermediate language (IL) callvirt instruction and that to the non-virtual method (B.Stat) through call IL instruction.

However, this is not true and callvirt is used for both. If we open the disassembly for the Main method using reflector or ILDASM this is what we see

    L_0000: nop 
    L_0001: newobj instance void ConsoleApplication1.D::.ctor()
    L_0006: stloc.0 
    L_0007: ldloc.0 
    L_0008: callvirt instance void ConsoleApplication1.B::Stat()
    L_000d: nop 
    L_000e: ldloc.0 
    L_000f: callvirt instance void ConsoleApplication1.B::Virt()
    L_0014: nop 
    L_0015: ret

Question is why? There are two reasons that have been brought forward by the CLR team

API change.
The reason is that .NET team wanted a change in an method (API) from non-virtual to virtual to be non-breaking. So in effect since the call is anyway generated as callvirt a caller need not be recompiled in case the callee changes to be a virtual method.
Null checking
If a call is generated and the method body doesn't access any instance variable then it is possible to even call on null objects successfully. This is currently possible in C++, see a post I made on this here.
With callvirt there's a forced access to this pointer and hence the object on which the method is being called is automatically checked for null.

callvirt does come with additional performance cost but measurement showed that there's no significant performance difference between call with null check vs callvirt. Moreover, since the Jitter has full metadata of the callee, while jitting the callvirt it can generate processor instructions to do static call if it figures out that the callee is indeed non-virtual.

However, the compiler does try to optimize situations where it knows for sure that the target object cannot be null. E.g. for the expression i.ToString(); where i is an int call is used to call the ToString method because Int32 is value type (cannot be null) and sealed.

Bong Geek - Abhinaba Basu

Links

Search