Bong Geek - Abhinaba Basu: 12/01/2006

Friday, December 15, 2006

switches and jump tables

In my last post I had discussed about how only constants can be used with C# switches. From the post's comments and later discussing with other folks I learnt something that came to me as a surprise. A lot of people working on managed code consider switch-case to be a stylistic variant of if-else, and that is all.

However, in the C/C++ world switch is not just a variant of if-else (neither is it in .NET), it's a fast (O(1)) variant of if-else (O(n)). Stating that switch is just a better way to express multiple comparison against the same variable is stating Dictionary<T> is just another form of List<T>. They are not (Dictionary can give you O(1) lookup results).

For example C restricts the case to have constant-expression. This is done so that the compiler can generate optimized jump-table for its execution. Let's consider the following code.

switch(i) {
    case 4:...
    case 5:...
    case 6:...
    case 7:...
    // many more cases...
    default:...
}

For this code the machine code generated is similar to the steps below.

Compile time jump table creation: For each case statement a fixed block of memory is reserved. Say 8 bytes. These 8 bytes contain a jump (jmp) instruction to the location where the actual code for the case resides. The base address of this table is labeled as say JMPTABLEBASE.
Normalizes the value of i as i = i - 4 (the first value of the case)
Boundary check: For the i it sees if the value is larger than the largest case (7-4 = 3), in case it is the execution flows to default.
Jump to address JMPTABLEBASE + (i * 8)

As you can see from above the whole thing happens at constant time.

Some embedded C compilers (like the TI C-compiler) generates separate code section named .switch for the jumptable. Later this section can be targetted to the high-speed internal DARAM for faster execution in case the switch needs such special treatment.

Obviously compilers vary a lot in this manner .

Why can we only use constants in a switch-case statement?

Why can we only use constants in a switch-case statement? The following code fails to compile with the error “A constant value is needed” for someStr as it is not a constant string.

static void func(string str)
{
    switch(str)
    {
        case "Zaphod": Console.WriteLine("The king"); break;
        case someStr: Console.WriteLine("The coder"); break;
        default: Console.WriteLine("None"); break;
    }
}
string someStr = "Noo";

Here goes a long answer to this short question.

The reason is simple and yet involved. Let’s take the following valid code which only has constants and see how it works.

static void func(string str)
{
    switch(str)
    {
        case "Zaphod": Console.WriteLine("The king"); break;
        case "Abhinab": Console.WriteLine("The coder"); break;
        default: Console.WriteLine("None"); break;
    }
}

If we open see the code in IL it looks something like this

      L_0007: ldstr "Zaphod"
      L_000c: call bool string::op_Equality(string, string)
      L_0011: brtrue.s L_0022
      L_0013: ldloc.0
      L_0014: ldstr "Abhinab"
      L_0019: call bool string::op_Equality(string, string)
      L_001e: brtrue.s L_002f
      L_0020: br.s L_003c
      L_0022: ldstr "The king"
      L_0027: call void [mscorlib]System.Console::WriteLine(string)
      L_002c: nop
      L_002d: br.s L_0049
      L_002f: ldstr "The coder"
      L_0034: call void [mscorlib]System.Console::WriteLine(string)
      L_0039: nop
      L_003a: br.s L_0049
      L_003c: ldstr "None"

See the usage of op_Equality in L_000C and L_0019. This indicates that even though we are using switch-case, ultimately the code is converted to multiple if-then-else by the compiler. So the switch-case is converted by the compiler to something like

if (str == "Zaphod")
    Console.WriteLine("The king");
else if (str == "Abhinab")
    Console.WriteLine("The coder");
else
    Console.WriteLine("None");

If this is the case then what is stopping the case statements from having non-constants? In case of a non-constant the code generated could be something like if (str == someNonConstantStr) which is valid code.

The answer is simple. When the numbers of cases are larger, the generated code is very different and is not constituted of if-then-else. Isn’t that obvious? Otherwise why would anyone ever use switch-case and why would we call switch case to be faster??

Lets see when we have a large number of case’s as follows what happens.

switch(str)
{
    case "Zaphod": Console.WriteLine("The king"); break;
    case "Abhinab": Console.WriteLine("The coder"); break;
    case "Ford": Console.WriteLine("The Hitchhiker"); break;
    case "Trilian": Console.WriteLine("The traveler"); break;
    case "Marvin": Console.WriteLine("The Robot"); break;
    case "Agrajag": Console.WriteLine("The Dead"); break;
    default: Console.WriteLine("None"); break;
}

For this first a class is generated by the compiler which looks like

Internal class CompilerGeneratedClass
{
      internal static Dictionary CompilerGenDict;
}

And then for the switch case the following code is generated.

if (CompilerGeneratedClass.CompilerGenDict == null)
{
    Dictionary dictionary1 = new Dictionary(6);
    dictionary1.Add("Zaphod", 0);
    dictionary1.Add("Abhinab", 1);
    ...
    CompilerGeneratedClass.CompilerGenDict = dictionary1;
}
if (CompilerGeneratedClass.CompilerGenDict.TryGetValue(
                                           str, out num1))
{
    switch (num1)
    {
    case 0: Console.WriteLine("The king"); return
    case 1: Console.WriteLine("The coder"); return;
    case 2: Console.WriteLine("The Hitchhiker"); return;
    ...
    }
}
Console.WriteLine("None");

What this means is that first time the function is called a Dictionary of strings (key) and int (value) is created and all the cases are stored in this dictionary as the key and an integer as a value is stored against it. Then for the switch statement the string is taken and is queried in the dictionary and if it is present the number value for the string is returned. Using this number the compiler creates an efficient jump table and it jumps to the target Console.Writeline string.

Now the answer :). The strings are pre-stored in the dictionary. If the strings in the case statement were not constants, changes in them won’t reflect in the dictionary and hence you’d land up comparing against stale value. To avoid this inconsistency the non-constant values are not supported at all.

Obviously for dynamic values the dictionary cannot be used and hence there is no optimization possible for switch-case so one should anyway use if-then-else.

Test Driven Development

A lot have already been said about Test Driven Development (TDD) by a lot of people, but I'd still like to add my 0.02paisa.

We have an internal requirement of checking in UnitTests along with the product code and the code coverage for the unit tests needs to be high. Most of our developers have over 85% code coverage.

In my sources I decided to try out TDD. I used the following steps

Write the method's prototype so that it matches the design doc and throw a NotImplementedException in it.
Write unit-tests in Visual Studio Team System unit-test framework. I try to cover all the requirements, even the ones required for negative testing like pass an invalid handle and catch an ArgumentNullException and verify that the correct argument is mentioned in ArgumentNullException.Param.
After that I run the tests. All the tests obviously fail with lots of X marks.
Then I go about fixing each of the test failures by adding the functionality in the code.
After each fix (or couple of them) I run the tests and the red marks keep changing into test passed green ticks.
Once I'm done, I run with code coverage and then add more tests if required to cover the blocks which were not touched by the tests.

Even though the system looks simple it has helped me enormously by catching multiple bugs at the beginning. Even trivial tests like tests for GetHashCode and operator overloads found issues :) The fun in seeing all those X marks disappear one after the other brings in a childish zeal to get them done even faster.

The other goodness is that after every code change later I can easily fire these tests for sanity check.

Conditional Text

We have moved to a new Satellite TV provider some time back. It is time to pay the quarterly bill, so I dug up their manual to look for online payment options. Sure enough there was a section on "Internet payments". I opened the section and it had one line. "For Internet payment options click here"!! How the hell am I supposed to click on a paper?

The reason for the line is simple enough, they are just distributing printed copies of their online documentation.

I shared this with couple of friends and the discussion soon turned to the issues with maintaining multiple versions of the same document. I figured out soon enough that they have not heard about conditional text supported in most DTP software. I was in the Dev team for Adobe FrameMaker and it was one of the features in the product. It works very much like the following C* kind of code

#if Web
     Console.WriteLine("Click <a href=\"http://www.abhinaba.com\">Here</a>");
#elif Doc
     Console.WriteLine("Visit http://www.abhinaba.com");
#endif

If the symbol WEB is defined then the fancy Click Here is printed else the URL is printed out. Conditional Content works very much like this. You can define document wide variables and associate text, images (any supported content) with these variables. Later you switch on/off one or more of these variables to print out various versions of the same doc. So all your common content remains common and you have the ability to pick/select from the rest.

No idea if Office supports this. But with the powerful collaboration features supported in Word, I highly suspect that this is indeed supported.

Change the world or go home

Saw this via Steve Clayton's blog. This is going to be my new wallpaper...

Funny messages are not always funny

I used to work in a Company where Easter Eggs used to be considered a feature and there used to be a official maintainer for it (I used to maintain both the Easters listed here and more). I used to feel that Microsoft shouldn't have moved away from inserting Easter eggs and should've stuck with funny messages in its Software. Microsoft Max (now canned) use to give really funny messages like suggesting that I get coffee as the installation may take some time.

However, with time I have realized that funny messages are not always funny (even though I still believe Easter eggs are). If you look up the demographic usage data of Orkut it is mostly used in Brazil (60%) and the 3rd highest usage is in India (12%) and growing astronomically. Server failures in Orkut is given out as a "Bad, Bad Server. No donut for you" message. Now the problem is in India we do not eat Donut and I have a ton of non-geek friends who've never been to the US and have no idea what a donut is. One of them got majorly irritated with the message.

I guess for free services where there is no paying customers it's OK to have these kind of funny light-hearted messages, but still you need to target your jokes well. Lets hope in the Future we have India servers throwing up "Bad Bad server, no Vada for you" messages :)

Sunday, December 03, 2006

Binary Banner

Cross posted from http://blogs.msdn.com/abhinaba/archive/2006/12/04/binary-banner.aspx

BoingBoing has a story of a store that sells t-shirts with offensive messages spelled out in binary. I think its a nice way to generate email signature or web-page banner where you can really say offensive stuff and get away with it. I wrote the following code in Ruby which takes any word and spits out the binary ASCII representation.

ARGV.each do str
  def PrintBin(ch)
    mask = 0x80
    while mask != 0
      print (if (ch & mask) == 0 then "0" else "1" end)
      mask >>= 1
    end
  end

  str.each_byte do ch
    PrintBin ch
  end
  print "\n"
end

My Ruby is completely rusted so it took me some time to get this done. Later I tried writing the same code in C#. It took me over 3 times the LOC it took me to do it up in Ruby. I'm sure a experinced Ruby programmer can make this even more simpler and concise.

C:\MyStuff\Code\Ruby>BinaryBan.rb Krikkit
01001011011100100110100101101011011010110110100101110100

I have already started using it on my web-site

Bong Geek - Abhinaba Basu

Links

Search

Friday, December 15, 2006

Sunday, December 03, 2006