Bong Geek - Abhinaba Basu: 2008

Sunday, December 14, 2008

When the spell checkers attack

Today Boing Boing had a post about the Cupertino effect in which spell-checkers erroneously change valid words. I had worked on Adobe Acrobat and Adobe FrameMaker's spell checking feature and do happen to appreciate the headache with getting them right.

I face Cupertino effect most often while typing in Indian names because most word-processors don't recognize them. For my wife's name Somtapa the offered spelling is Stomata followed by Sumatra. So I started calling her Stomata until it got onto her nerves and something very similar to the photo above happened to me

Thursday, November 13, 2008

Is interlocked increment followed by comparison thread safe?

Sorry about the blog title, my imagination failed me :(.

In our internal alias someone asked the question "Is the following thread safe"

if(Interlocked.Increment(ref someInt) == CONSTANT_VAL)
{
    doSomeStuff();
}

My instant reaction was no because even though the increment is done in a thread safe way using System.Threading.Interlocked class, the comparison that follows is not safe.

My reasoning was that the "if" expression can be broken down to the following operations

Fetch of someInt
Increment operation
Write back of someInt
Comparison

The first 3 are done inside the Increment method and it provides concurrency protection and hence cannot be interleaved by another instruction.

So if two threads are running in parallel (one marked in red and the other in green) I assumed that the following interleaving is possible

someInt is 4 and CONSTANT_VAL is 5
Fetch of someint -> someInt ==4
Increment operation -> someInt ==5
Write back of someint -> someInt ==5
Fetch of someint -> someInt == 5
Increment operation -> someInt == 6
Write back of someint -> someInt == 6
comparison -> compare 6 & CONSTANT_VAL
comparison -> compare 6 & CONSTANT_VAL

This means that the comparison of both thread will fail.

However, someone responded back that I was wrong as the return value is being used and not the written back value. This made me do some more investigation.

If I see the JITted code then it looks like

if (Interlocked.Increment(ref someInt) == CONSTANT_VAL)
00000024  lea         ecx,ds:[002B9314h] 
0000002a  call        796F1221 
0000002f  mov         esi,eax 
00000031  cmp         esi,5

The call (at 0x000002a) is to the native code inside CLR which in turn calls Win32 api (InterlockedIncrement).

However, the last 2 lines are the interesting ones. Register EAX contains the return value and comparison is happening against that and CONSTANT_VAL. So even if the second thread had already changed the value of someInt it doesn’t have any effect as the return of the first increment is being used and not the safeInt value in memory. So first comparison (step 8 above) will actually compare CONSTANT_VAL against 5 and succeed.

Sunday, October 26, 2008

C/C++ Compile Time Asserts

The Problem

Run time asserts are fairly commonly used in C++. As the MSDN documentation for assert states

"(assert) Evaluates an expression and, when the result is false, prints a diagnostic message and aborts the program."

There is another type of asserts which can be used to catch code issues right at the time of compilation. These are called static or compile-time asserts. These asserts can be used to do compile time validations and are very effectively used in the .NET Compact Framework code base.

E.g. you have two types Foo and Bar and your code assumes (may be for a reinterpret_cast) that they are of the same size. Now being in separate places there is always a possibility that someone modifies one without changing the other and that results in some weird bugs. How do you express this assumption in code? Obviously you can do a run-time check like

assert(sizeof(foo) == sizeof(bar));

If that code is not hit during running this assert will not get fired. This might be caught in testing later. However, if you notice carefully all of the information is available during compilation (both the type and the sizeof is resolved while compilation). So we should be able to do compile time validation, with-something to the effect

COMPILE_ASSERT(sizeof(int) == sizeof(char));

This should be tested during compilation and hence whether the code is run or not the assert should fail.

The Solution

There are many ways to get this done. I will discuss two quick ways

Array creation

You can create a MACRO expression as follows

#define COMPILE_ASSERT(x) extern int __dummy[(int)x]

This macro works as follows

// compiles fine 
COMPILE_ASSERT(sizeof(int) == sizeof(unsigned)); 
// error C2466: cannot allocate an array of constant size 0
COMPILE_ASSERT(sizeof(int) == sizeof(char));

The first expression gets expanded to int __dummy[1] and compiles fine, but the later expands to int __dummy[0] and fails to compile.

The advantage of this approach is that it works for both C and C++, however, the failure message is very confusing and doesn't indicate what the failure is for. It is left to the developer to visit the line of compilation failure to see that it's a COMPILE_ASSERT.

sizeof on incomplete type

This approach works using explicit-specialization of template types and the fact that sizeof of incomplete types fail to compile.

Consider the following

namespace static_assert
{
    template <bool> struct STATIC_ASSERT_FAILURE;
    template <> struct STATIC_ASSERT_FAILURE<true> { enum { value = 1 }; };
}

Here we defined the generic type STATIC_ASSERT_FAILURE. As you see the type is incomplete (no member definition). However we do provide a explicit-specialization of that type for the value true. However, the same for false is not provided. This means that sizeof(STATIC_ASSERT_FAILURE<true>) is valid but sizeof(STATIC_ASSERT_FAILURE<false>) is not. This can be used to create a compile time assert as follows

namespace static_assert
{
    template <bool> struct STATIC_ASSERT_FAILURE;
    template <> struct STATIC_ASSERT_FAILURE<true> { enum { value = 1 }; };

    template<int x> struct static_assert_test{};
}

#define COMPILE_ASSERT(x) \
    typedef ::static_assert::static_assert_test<\
        sizeof(::static_assert::STATIC_ASSERT_FAILURE< (bool)( x ) >)>\
            _static_assert_typedef_

Here the error we get is as follows

// compiles fine
COMPILE_ASSERT(sizeof(int) == sizeof(unsigned)); 
// error C2027: use of undefined type 'static_assert::STATIC_ASSERT_FAILURE<__formal>
COMPILE_ASSERT(sizeof(int) == sizeof(char));

So the advantage is that the STATIC_ASSERT_FAILURE is called out right at the point of failure and is more obvious to figure out

The macro expansion is as follows

typedef static_assert_test< sizeof(STATIC_ASSERT_FAILURE<false>) > _static_assert_typedef_
typedef static_assert_test< sizeof(incomplete type) > _static_assert_typedef_

Similarly for true the type is not incomplete and the expansion is

typedef static_assert_test< sizeof(STATIC_ASSERT_FAILURE<true>) > _static_assert_typedef_
typedef static_assert_test< sizeof(valid type with one enum member) > _static_assert_typedef_
typedef static_assert_test< 1 > _static_assert_typedef_

Put it all together

All together the following source gives a good working point to create static or compile time assert that works for both C and C++

#ifdef __cplusplus

#define JOIN( X, Y ) JOIN2(X,Y)
#define JOIN2( X, Y ) X##Y

namespace static_assert
{
    template <bool> struct STATIC_ASSERT_FAILURE;
    template <> struct STATIC_ASSERT_FAILURE<true> { enum { value = 1 }; };

    template<int x> struct static_assert_test{};
}

#define COMPILE_ASSERT(x) \
    typedef ::static_assert::static_assert_test<\
        sizeof(::static_assert::STATIC_ASSERT_FAILURE< (bool)( x ) >)>\
            JOIN(_static_assert_typedef, __LINE__)

#else // __cplusplus

#define COMPILE_ASSERT(x) extern int __dummy[(int)x]

#endif // __cplusplus

#define VERIFY_EXPLICIT_CAST(from, to) COMPILE_ASSERT(sizeof(from) == sizeof(to)) 

#endif // _COMPILE_ASSERT_H_

The only extra part is the JOIN macros. They just ensure that the typedef is using new names each time and doesn't give type already exists errors.

More Reading

Boost static_assert has an even better implementation that takes cares of much more scenarios and compiler quirks. Head over to the source at http://www.boost.org/doc/libs/1_36_0/boost/static_assert.hpp
Modern C++ Design by the famed Andrei Alexandrescu

Wednesday, October 22, 2008

Taking my job more seriously

I generally take my work very seriously. However, the following from SICP was still my favorite quote

"I think that it's extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don't think we are. I think we're responsible for stretching them, setting them off in new directions, and keeping fun in the house. I hope the field of computer science never loses its sense of fun. Above all, I hope we don't become missionaries. Don't feel as if you're Bible salesmen. The world has too many of those already. What you know about computing other people will learn. Don't feel as if the key to successful computing is only in your hands. What's in your hands, I think and hope, is intelligence: the ability to see the machine as more than when you were first led up to it, that you can make it more.''

However, couple of days back I was lying in a cold room with a doctor probing my Thyroid gland with ultrasound beams. She went on explaining the details of the tumor that I had, pointing to the screen. In my mind I was thinking about folks who have to code for these things and how careful they need to be.

The first thing I did when I came out of the room is search for whether .NET Compact Framework gets used for these kinds of equipment. Of the hundreds of pages I hit the one titled "Control System for Lung Ventilation Equipment with Windows CE , Microsoft .Net Compact Framework and Visual Studio Team System" struck me. Mostly because it had two products I personally coded for (NETCF and VSTS). There were even more life saving equipment listed on the search page.

It was a very humbling experience. I made a little vow, I'll be more careful when I code from tomorrow.

Tuesday, October 21, 2008

Silverlight on Nokia S60 devices

In many of my blog posts (e.g.here and here) I refer to .NET Compact Framework and Symbian OS (S60) and obviously folks keep asking me via comments (or assume) that we are porting .NETCF on S60 devices. So I thought it's time to clarify :)

The short answer is that we are not porting .NETCF to S60 devices, but we are porting the Silverlight on S60 devices. This was jointly announced by Microsoft and Nokia, read Nokia's press release here.

We are in very early stage of development and it is very hard to tell how much will be in it (e.g. will it be SL v1.0 or v2.0). However, we are working hard and as and when more details emerge I will share it out from this blog. So keep reading.

Monday, October 20, 2008

Who halted the code

Today a bed time story involving un-initialized variable access and a weird coincidence.

Couple of weeks back I was kind of baffled with a weird issue I was facing with the .NET Compact Framework Jitter. We were making the Jitter work for Symbian OS (S60) emulator. The S60 emulator is a Windows process and hence we needed to bring up the i386 Jitter in our code base. After running some JITted code the application would stop executing with the message "Execution halted" in the stack trace.

Now obviously the user didn't halt anything and the program was a simple hello world.

After some debugging I found an interesting thing. For the debug build the Symbian C compiler memsets un-initialized variables to 0xcccc. This is fairly standard and is used to catch un-initialized variable access.

However, our Jit code had a bug such that in some scenario it was not emitting function call/returns correctly. So instead of returning we were executing arbitrary memory. But instead of some sort of access violation (or some thing equally bad) the execution was actually halting.

The reason was we started executing un-initialized data. Now this data is 0xcccc. The S60 debugger uses interrupt 3 (INT 3) for inserting breakpoints and the following line reveals the rest of the story

For i386 the instruction used for inserting breakpoint and the pattern used for un-initialized data matched and the debugger thought that it hit a breakpoint.

Thursday, October 16, 2008

Hex signatures

Frequently in code we need to add bit patterns or magic numbers. Since hex numbers have the alphabets A-F folks become creative and create all kinds of interesting words. Typical being 0xDEAD, 0xBEEF, 0xFEED or various combination of these like 0xFEEDBEEF. However, someone actually used the following in a sample

0xbed1babe

No naming the guilty :)

Wednesday, October 15, 2008

Back To Basics: Finding your stack usage

Handling stack overflow is a critical requirement for most Virtual Machines. For .NET Compact framework (NETCF) it is more important because it runs on embedded devices with very little memory and hence little stack space.

The .NETCF Execution Engine (EE) needs to detect stack overflows and report it via StackOverflowException. To do this it uses some logic which we internally refer to as StackCheck (that will be covered in a later post). The algorithm needs fair prediction of stack usage for system APIs (e.g. Win32 APIs or for Symbian OS, S60 APIs).

Each time we target a new platform we do some measurements in addition to referring to specs :) to find it's stack characteristics. As we are currently making the NETCF work on Symbian OS we are doing these measurements again. So I thought of sharing how we are going about measuring stack usage using simple watermarking.

The technique

Step:1

On method Entry store the two current and max stack values. This typically available via system APIs which in case of Symbian is available over the TThreadStackInfo class (iBase, iLimit and other members).

                       +---------------+
                       |               |
                       |               |
                       |               |
current stack ------>  +---------------+
                       |               |
                       |   Available   |
                       |    Stack      |
                       |               |
                       |               |
                       |               |
                       |               |
Stack limit ---------> +---------------+
                       |               |
                       |               |
                       .               .
                       .               .
                       .               .

Step :2

Get a pointer on to the current stack pointer. How to get the pointer will vary based on the target platform. Options include system APIs, de-referencing stack pointer register (e.g. ESP register on x86) or simply creating a local variable on the stack and getting it's pointer.

Then Memset the whole region from current stack to the total available with some known pattern, e.g. 0xDEAD (a larger signature is a better approach to ensure there is no accidental match)

                       +---------------+
                       |               |
                       |               |
                       |               |
current stack ------>  +---------------+
                       |     DEAD      |
                       |     DEAD      |
                       |     DEAD      |
                       |     DEAD      |
                       |     DEAD      |
                       |     DEAD      |
                       |     DEAD      |
                       |     DEAD      |
Stack limit ---------> +---------------+
                       |               |
                       |               |
                       .               .
                       .               .
                       .               .

Step: 3

Make an OS or whatever call you want to measure.

                       +---------------+
                       |               |
                       |               |
                       |               |
current stack ------>  +---------------+
                       |     1231      | --+
                       |     1231      |   |
                       |     D433      |   +--> Stack usage
                       |     D324      |   |
                       |     3453      | --+
                       |     DEAD      |
                       |     DEAD      |
                       |     DEAD      |
Stack limit ---------> +---------------+
                       |               |
                       |               |
                       .               .
                       .               .
                       .               .

Step 4:

When the call returns the stack will get modified. Iterate through the memory starting from the current stack pointer looking for the first occurrence of the pattern you’ve set in Step:2. This is the water mark and is the point till which the stack got used by the OS call. Subtract the water mark from the original entry point saved in Step 1 and you have the stack usage.

Friday, September 26, 2008

Tail call optimization

I had posted about tail call and how .NET handles it before. However there was some email exchanges and confusion on internal DLs around it and so I thought I'd try to elaborate more on how .NET goes about handling tail calls

Let’s try to break this down a bit.

a) A piece of C# code contains tail recursion .

static void CountDown(int num)
{ 
    Console.WriteLine("{0} ", num); 
    if (num > 0) 
        CountDown(--num); 
}

b) The C# compiler does not optimize this and generates a normal recursive call in IL

 IL_000c:  ...
 IL_000d:  ...
 IL_000e:  sub
 IL_000f:  dup
 IL_0010:  starg.s num
 IL_0012:  call void TailRecursion.Program::CountDown(int32)

c) The Jitter on seeing this call doesn’t optimize it in any way and just inserts a normal call instruction for the platform it targets (call for x86)

However, if any compiler is smart enough to add the tail. recursive IL statement then things change.

a) Scheme code

(define (CountDown n)
    (if (= n 0)
         n
      (CountDown (- n 1))))

b) Hypothetical IronScheme compiler will generate (note for scheme it has to do the optimization)

 IL_000c:  ...
 IL_000d:  ...
 IL_000e:  sub
  tail.
 IL_0023:  call void TailRecursion.Program::CountDown(int32)
 IL_0029:  ret

c) Based on which JIT you are using and various other scenarios the JIT now may honour the tail. IL instruction and optimize this with a jmp when generating machine instruction

Please note the may in the very last point and refer to here for some instances where the optimization still might not take place…

Tuesday, September 16, 2008

A* Pathfinding algorithm animation screen saver

I'm trying to learn WPF and IMO it is not a simple task. Previously whenever I upgraded my UI technology knowledge it had been easy as it build on a lot of pre-existing concepts. However, WPF is more of a disruptive change.

I decided to write some simple fun application in the effort to learn WPF. Since I cannot write a BabySmash application as Scott Hanselman has already done a awesome job out of it, I decided to code up a simulation of A* path finding algorithm and push it out as a screen saver. The final product looks as follows.

What it does is that it shows a source, a destination with blockages (wall, mountain?) in between and then uses A* algorithm to find a path in between them. This is the same kind of algorithm that is used in say games like Age of the Empires as workers move around collecting wood and other resources. The algorithm is documented here.

Features

It supports plugging in your own scenes with help of the awesome screen/board designer I blogged about
Comes with a WPF and a console client to show the animation
Algorithm can be played with easily to change or tweak it.
Shows full animation including start, end, obstacle, closed cells, current path being probed and the final path
Multi-screen support. Each screen shows a different board being solved.

Limitations:

Obviously this was more of a quick dirty hobby project and there remains a ton to work on. E.g.

Screen saver preview doesn't work.
Setting dialog is a sham.
The boards do not flip for vertically aligned screens
The XAML data binding is primitive and needs some work
The path can choose diagonally across cell even if it seems to cross over diagonal wall of obstacle.
Mouse move alone doesn't shut down the screen saver. You need to hit a
Many more :)

Download:

Download the final binaries from here. Unzip to a folder, right click on the *.scr and choose Install

Download sources (including VS 2008 sln) from here.

Enjoy.

Sunday, September 14, 2008

How Many Types are loaded for Hello World

Consider the following super simple C# code

namespace SmartDeviceProject1
{
    class Program
    {
        static void Main(string[] args)
        {
            System.Console.WriteLine("Hello");
        }
    }
}

Can you guess how many managed Type gets loaded to run this? I was doing some profiling the .NET Compact Framework loader (for entirely unrelated reason) and was surprised by the list that got dumped. 87 types**, never could've guessed that...

System.Object
System.ValueType
System.Type
System.Reflection.MemberInfo
System.Delegate
System.MarshalByRefObject
System.SystemException
System.Exception
System.Attribute
System.Collections.Hashtable
System.Enum
System.Reflection.BindingFlags
System.MulticastDelegate
System.Reflection.MemberFilter
System.Reflection.Binder
System.Reflection.TypeAttributes
System.DateTime
System.Collections.IDictionary
System.UnhandledExceptionEventHandler
System.AppDomainManager
System.Version
System.Decimal
System.Runtime.InteropServices.ComInterfaceType
System.Collections.ICollection
System.Collections.IEqualityComparer
System.AppDomainManagerInitializationOptions
System.ArithmeticException
System.ArgumentException
System.MissingMemberException
System.MemberAccessException
System.AppDomainSetup
System.Reflection.AssemblyName
System.Globalization.CultureInfo
System.Reflection.Assembly
System.Configuration.Assemblies.AssemblyHashAlgorithm
System.Configuration.Assemblies.AssemblyVersionCompatibility
System.Reflection.AssemblyNameFlags
System.Globalization.CultureTableRecord
System.Globalization.CompareInfo
System.Globalization.TextInfo
System.Globalization.NumberFormatInfo
System.Globalization.DateTimeFormatInfo
System.Globalization.Calendar
System.Globalization.BaseInfoTable
System.Globalization.CultureTable
System.Globalization.CultureTableData
System.Globalization.NumberStyles
System.Globalization.DateTimeStyles
System.Globalization.DateTimeFormatFlags
System.Globalization.TokenHashValue
System.Globalization.CultureTableHeader
System.Globalization.CultureNameOffsetItem
System.Globalization.RegionNameOffsetItem
System.Globalization.IDOffsetItem
System.TokenType
System.Char
System.IO.TextReader
System.IO.TextWriter
System.IFormatProvider
System.Console
System.RuntimeTypeHandle
System.NotSupportedException
System.Globalization.EndianessHeader
System.Reflection.Missing
System.RuntimeType
System.Threading.StackCrawlMark
System.Globalization.CultureTableItem
System.Int32
System.Security.CodeAccessSecurityEngine
System.AppDomain
System.LocalDataStoreMgr
System.Threading.ExecutionContext
System.LocalDataStore
System.Collections.ArrayList
System.Threading.SynchronizationContext
System.Runtime.Remoting.Messaging.LogicalCallContext
System.Runtime.Remoting.Messaging.IllogicalCallContext
System.Threading.Thread
System.Collections.Generic.Dictionary`2
System.Runtime.Remoting.Messaging.CallContextRemotingData
System.RuApplication starting
System.Collections.Generic.IEqualityComparer`1
Runtime.Remoting.Messaging.CallContextSecurityData
System.Array
System.RuntimeFieldHandle
System.Globalization.CultureTableRecord[]
System.Text.StringBuilder

**This is for the compact framework CLR. Your mileage will vary if you run the same on the desktop CLR.

Thursday, September 11, 2008

Designer for my path finding boards

I'm writing a small application (or rather a screen saver) that animates and demonstrates A* search algorithm. The idea is simple. On screen you see a start and end point and some random obstacles in between them. Then you see animation on how A* algorithm is used to navigate around the board to find a path (close to the shortest possible) between the two.

All of this is fairly standard. However, I got hit by a simple issue. I wanted to have the ability to design this board visually and also let my potential million users do the same. Obviously I don't have time to code up a designer and hence choose the all time manager's favorite technique. Re-use :)

So the final solution I took was to use Microsoft Office Excel as the WYSIWYG editor. I created a xlsx file with the following conditional formatting which colors cells based on the value of the cells.

Excel Conditional Formatting screen shot for blog

So in this case

w = Wall marking the end of the table
b = blocks/bricks/obstacle
s = start
e = end

Using this I can easily design the board. The designer in action looks as follows

Since the excel sheet has conditional formatting the user just types in s, e, b, w in the cells and they all light up visually. At the end he/she just saves the file using File => Save As and uses CSV format. This saves the file shown above as

w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w
w,,,,,,,,,,,,,,,,,,,,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,b,,,,,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,s,,,,,,,,b,b,b,b,b,b,,,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,,b,b,b,b,,b,b,b,b,,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,b,b,b,,,,,,b,b,b,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,b,b,,,,,,,,b,b,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,b,b,,,,,,,,b,b,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,b,b,,,,,,,b,b,,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,b,b,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,b,b,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,b,b,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,b,,,,b,,,,,,,,,,,e,,,w
w,,,,,,,,,,,,,,,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,,,,,b,,,,,,,,,,,,,,w
w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w

As you see the format is simply a row per line and every column separated by a comma. My simulator reads this file and renders using whatever renderer is configured (console or WPF).

More about the A* simulator to come soon...

Tuesday, September 09, 2008

It's is easy to claim that the world won't get destroyed today

Taken inside the Microsoft Campus Hyderabad

The clock is ticking and the Large Hadron Collider in Cern is going to get switched on today (9/10/2008). Even though there are speculations, CERN is claiming it's perfectly safe and the world won't end. But it's easy to claim that, who'll be around to prove them wrong in case they are :)

It's one of the coolest device at an operating temperature of < -270°C. But I'd get really angry if there's any disturbance to my Birthday celebrations!!

Tuesday, August 26, 2008

Team Foundation Server tool dump workspace details

I juggle around with a lot of workspaces. The reason is .NET Compact Framework is consumed in a whole bunch of things like Windows Mobile, Xbox, Zune, Nokia and most of them are on different source branches. On top of this active feature work happens in feature branches and there are other service branches to frequently peek into.

So the net effect is I need to sometime go to different folders and see what it's source control workspace mapping is or I need to dump details of workspaces on other computers and see if I can use that workspace as a template to create a new workspace.

The thing I hate here is to run cd to that folder and do a tf workspace to bring up the GUI and then dismiss the UI. I don't like GUI when things can be done equally well on the console. So I quickly hacked up a tool that dumps workspace details onto the console, something as below

d:\>workspace.exe NETCF\F01
Name:      ABHINAB2_S60
Server:    TfsServer01
Owner:     Abhinaba
Computer:  ABHINAB2

Comment:

Working folders:

Active   $/NetCF/F01/tools     d:\NETCF\F01\tools
Active   $/NetCF/F01/source    d:\NETCF\F01\source

This tool can be used either to see the mapping of a folder or given a workspace name and owner dump it details. It uses the exact same format as show in the tf workspace UI.

Source

The source and a build binary is linked below. It is a single source file (Program.cs) and you should be able to pull it into any version of Visual Studio and even build it from the command line using CSC

Using the tool

Build it or use the pre-built binary and edit the .config file to point it to your TFS server and you are all set to go.

Wednesday, August 20, 2008

Obsession with desktop continues

This is my new office setup. I have been pushing around a lot of code lately and felt I needed more real-estate to effectively do what I'm doing. So I hooked up another monitor. All 3 are standard HP LP1965 19" monitors.

However, since none of my video cards support 3 monitors and I have a whole bunch of computers to hook up I had to do some complex wiring around :). If I number the screens 1,2,3 from left to right this is how it works out

1 and 2 is connected to my main dev box (machine1)
2 and 3 is connected to the machine I use for emails and browsing (machine 2)
2 actually goes through a 4-port KVM switch using which it can circle through machine1, machine2 and two other less used machines
1 also goes through a 2 port KVM switch and connects a xbox and machine 1
I don't use the laptop at work directly, I connect to it using remote desktop.

Sweet :)

It's not as complex as it sounds. For my normal flow I just use the KVM to switch between machine-1 and machine-2. I rarely need to switch between the xbox and machine1.

PS: Don't try to make much sense of the Vintage books on the shelve, I plan to keep them for some more time before handing them off to some historian.

Sunday, August 17, 2008

Back to Basic: Using a System.Threading.Interlocked is a great idea

I just saw some code which actually takes a lock to do a simple set operation. This is bad because taking locks are expensive and there is an easy alternative. The System.Threading.Interlocked class and its members can get the same job done and much faster.

I wrote the following two methods which increments a variable a million times. The first method does that by taking a lock and the other uses the Interlocked.Increment API.

static int IncrementorLock()
{
    int val = 0;
    object lockObject = new object();
    for (int i = 0; i < 1000000; ++i)
    {
        lock (lockObject)
        {
            val++;
        }
    }

    return val;
}

static int IncrementorInterlocked()
{
    int val = 0;
    for (int i = 0; i < 1000000; ++i)
    {
        System.Threading.Interlocked.Increment(ref val);
    }

    return val;
}

I then used the Visual Studio Team System Profiler (instrumented profiling) and got the following performance data.

Function Name	Elapsed Inclusive Time	Application Inclusive Time
IncrementorInterlocked()	1,363.45	134.43
IncrementorLock()	4,374.23	388.69

Even though this is a micro benchmark and uses a completely skewed scenario, it still drives the simple point that using the interlocked class is way faster.

The reason the interlocked version is faster is because instead of using .NET locks it directly calls Win32 apis like InterlockedIncrement which ensures atomic updation to variables at the OS scheduler level.

These Interlocked APIs complete depend on the support of the underlying OS. This is the reason you'd see that all the APIs are not available uniformly across all versions of .NET (e.g. there is no uniform coverage over WinCE and Xbox). While implementing these for the Nokia platform (S60) we are hitting some scenarios where there is no corresponding OS API.

Sunday, July 20, 2008

String equality

akutz has one of the most detailed post on string interning and equality comparison performance metrics I have ever seen. Head over to the post here

I loved his conclusion which is the crux of the whole story.

"In conclusion, the String class’s static Equals method is the most efficient way to compare two string literals and the String class’s instance Equals method is the most efficient way to compare two runtime strings. The kicker is that there must be 10^5 (100,000) string comparisons taking place before one method starts becoming more efficient than the next. Until then, use whichever method floats your boat."

Kismat Konnection

I had a real hard time giving a title to this post.

The post was originally supposed to be about the fire I saw yesterday in a movie theatre Talkie Town (map). However it is not really about the fire, it is more about the people and how they reacted to it.

First things first, the fire.

Folks who know me, know that I go to a movie theatre once in couple of years, I prefer movies visiting me rather than the other way around. In the 4 years I'm in Hyderabad, this was my 3rd visit to a movie. So I can say it was a momentous occasion when me and my wife impromptu decided to see a movie in the 2 screen theatre in Miyapur called Talkie Town. The decision was heavily biased on the fact that my father-in-law was at home looking after our daughter.

Batman (or rather The Dark Knight) lost to my wife and I was forced to see this bollywood flick Kismat Konnection.

This is where all the fun started. Just after the interval (1.5 hours through, and yes bollywood movies are that long) people saw some light on the theatre ceiling. Soon the light spread and there was a small hole in the ceiling and we could actually see flames. Whatever was above the sound proofed ceiling had caught fire and the heat actually caused the material of the ceiling to burn.

Then the most amazing thing happened, 50% people didn't care. They were gleefully looking up and not even caring to leave the theatre. I'm not talking about some false fire alarm, or smell of smoke, it was a real fire with flames and a burned hole in the ceiling!! The fire alarm didn't sound and no one from the theatre authority seemed to be around. I left and called some folks and they simply went to the terrace. All the noise now prompted some more people to leave but even at that time there were others happily watching as if the whole thing was a part of the movie.

Then even more amazing thing happened, the folks from the theatre came and said that the fire has been doused and re-started the show. The hall was smoky and the AC was off. Me and another guy (yea there were some more sane people around) caught hold of an official and he simply said he has checked and he can guarantee that everything is safe. I asked him that since he couldn't ensure that there is no fire in the first place how can he guarantee against a recurrence, he simply said I can get a refund of my ticket. Which I did and left the hall.

On the way back I had a revelation. From childhood I had seen that people around me had little care about safety in general, but this is the first time I figured out that people didn't even care about their own safety. How can a movie be worth taking the risk of sitting in a fire hazard zone and that too with small children on their lap is something I will never figure out.

A lot of things including the traffic situation and the weird jay walking I see around suddenly makes more sense to me.

Wednesday, July 09, 2008

Writing exception handlers as separate methods may prove to be a good idea

Let us consider a scenario where you catch some exception and in the exception handler do some costly operation. You can write that code in either of the following ways

Method-1 : Separate method call

public class Program
{
    public static void Main(string[] args)
    {
        try
        {
            using (DataStore ds = new DataStore())
            {
                // ...
            }
        }
        catch (Exception ex)
        {
            ErrorReporter(ex);
        }
    }

    private static void ErrorReporter(Exception ex)
    {
        string path = System.IO.Path.GetTempFileName();
        ErrorDumper ed = new ErrorDumper(path, ex);
        ed.WriteError();

        XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.Load(path);
        RemoteErrorReporter er = new RemoteErrorReporter(xmlDoc);
        er.ReportError();
    }
}

Method-2 : Inline

public static void Main(string[] args)
{
    try
    {
        using (DataStore ds = new DataStore())
        {
            // ...
        }
    }
    catch (Exception ex)
    {
        string path = System.IO.Path.GetTempFileName();
        ErrorDumper ed = new ErrorDumper(path, ex);
        ed.WriteError();

        XmlDocument xmlDoc = new XmlDocument();
        xmlDoc.Load(path);
        RemoteErrorReporter er = new RemoteErrorReporter(xmlDoc);
        er.ReportError();
    }
}

The simple difference is that in the first case the exception handler is written as a separate method and in the second case it is placed directly inline inside the handler itself.

The question is which is better in terms of performance?

In case you do have significant code and type reference in the handler and you expect the exception to be thrown rarely in an application execution then the Method-1 is going to be more performant.

The reason is that just before executing a method the whole method gets jitted. The jitted code contains stubs to the other method's it will call but it doesn't do a recursive jitting. This means when Main gets called it gets jitted but the method ErrorReporter is still not jitted. So in case the exception is never fired all the code inside ErrorReporter never gets Jitted. This might prove to be significant saving in terms of time and space if the handling code is complex and refers to type not already referenced.

However, if the code is inline then the moment Main gets jitted all the code inside the catch block gets jitted. This is expensive not only because it leads to Jitting of code that is never executed but also because all types referenced in the catch block is also resolved resulting in loading a bunch of dlls after searching though the disk. In our example above System.Xml.dll and the other dll containing remote error reporting gets loaded even though they will never be used. Since disk access, loading assemblies and type resolution are slow, the simple change can prove to give some saving.

Tuesday, July 08, 2008

Microsoft RoundTable

Our conference rooms have been fitted with this really weird looking device (click to enlarge).

I had no clue what the thing was. Fortunately it's box was still placed in the room along with the manual. It's called the Microsoft RoundTable and it is actually a 360-degree camera (with 5 cameras and 6 microphones). It comes with bundled software that let's all participant be visible to the other side in a live meeting at real time. It shows the white board and the software is intelligent enough to focus on and track the active speaker (using microphone and face recognition) and much much more (lot of MS Research stuff has gone into it). The video below gives you some idea and head on to this post for some review and inside view of the device.

Simply put it's AWSOME

Monday, July 07, 2008

Do namespace using directives affect Assembly Loading?

The simple answer is no, the inquisitive reader can read on :)

Close to 2 year back I had posted about the two styles of coding using directives as follows

Style 1

namespace MyNameSpace
{
    using System;
    using System.Collections.Generic;
    using System.Text;
    // ...
}

Style 2

using System;
using System.Collections.Generic;
using System.Text;

namespace MyNameSpace
{
    // ...
}

and outlined the benefits of the first style (using directives inside the namespace). This post is not to re-iterate them.

This post to figure out if either of the styles have any bearing in the loading order of assemblies. Obviously at the first look it clearly indicates that is shouldn't, but this has caused some back and forth discussions over the web.

Scot Hanselman posted about a statement on the Microsoft Style cop blog which states

"When using directives are declared outside of a namespace, the .Net Framework will load all assemblies referenced by these using statements at the same time that the referencing assembly is loaded.

However, placing the using statements within a namespace element allows the framework to lazy load the referenced assemblies at runtime. In some cases, if the referencing code is not actually executed, the framework can avoid having to load one or more of the referenced assemblies completely. This follows general best practice rule about lazy loading for performance.

Note, this is subject to change as the .Net Framework evolves, and there are subtle differences between the various versions of the framework."

This just doesn't sound right because using directives have no bearing to assembly loading.

Hanselman did a simple experiment with the following code

using System;  
using System.Xml;  
  
namespace Microsoft.Sample  
{  
   public class Program  
   {  
      public static void Main(string[] args)  
      {  
         Guid g = Guid.NewGuid();  
         Console.WriteLine("Before XML usage");  
         Console.ReadLine();  
         Foo();  
         Console.WriteLine("After XML usage");  
         Console.ReadLine();  
      }  
  
      public static void Foo()  
      {  
         XmlDocument x = new XmlDocument();  
      }  
   }  
}

and then he watched the loading time using process explorer and then he moved the using inside the namespace and did the same. Both loaded the System.Xml.dll after he hit enter on the console clearly indicating that for both the cases they got lazy loaded.

Let me try to give a step by step rundown of how the whole type look up of XmlDocument happens in .NETCF which in turn would throw light on whether using directives have bearing on assembly loading.

When Main method is Jitted and ran the System.Xml.dll is not yet loaded
When method Foo is called the execution engine (referred to as EE) tries to JIT the method. As documented the Jitter only JITs methods that are to be executed.
The Jitter tries to see if the method Foo is managed (could be native as well due to mixed mode support) and then tries to see if it's already Jitted (by a previous call), since it's not it goes ahead with jitting it
The jitter validates a bunch of stuff like whether the class on which the method Foo is being called (in this case Microsoft.Sample.Program) is valid, been initialized, stack requirements, etc...
Then it tries to resolve the local variables of the method. It waits to resolve the local variable type reference till this point so that it is able to save time and memory by not Jitting/loading types that are referenced by methods that are never executed
Then it tries to resolve the type of the variable which in this case if System.Xml.XmlDocument.
It sees if it's already in the cache, that is if that type is already loaded
Since it's not the case it tries to search for the reference based on the type reference information
This information contains the full type reference including the assembly name, which in this case is System.Xml.dll and also version information,strong name information, etc...
All of the above information along with other information like the executing application's path is passed to the assembly loader to load the assembly
The usual assembly search sequence is used to look for the assembly and then it is loaded and the type reference subsequently gets resolved

If you see the above steps there is in no way a dependency of assembly loading on using directive. Hence at least on .NETCF whether you put the using outside or inside the namespace you'd get the referenced assemblies loaded exactly at the time of first reference of a type from that assembly (the step #5 above is the key).

Sunday, July 06, 2008

Auto generating Code Review Email for TFS

We use a small command line tool called crmail to auto-generate code review email from shelveset. I find the whole process very helpful and thought I'd share the process and the tool (which has some really cool features).

Features

Automatic generation of the email from the shelveset details
Hyperlinks are put to TFS webaccess so that you can review code from machines without any tools installed, even without source enlistment. Yes it's true!!! The only thing you need is your office's intranet access
You can even use a Windows mobile phone :) and even some non MS browsers. Ok I guess I have sold this enuf
This is how the email looks like with all the details pointed out
Effectively you can see the file diff, history, blame (annotate), shelveset details, associated bugs, everything from your browser and best thing is that all of these takes one click each.
This is how the fill diff looks in the browser

Pre-reqs

Team System Web Access (TSWA) 2008 power tool installed on your TFS server
Outlook installed on the machine on which the email is generated
Enlistment and TFS client installed on the machine on which the email is generated
For reviewers there is no pre-req other than a browser and email reader.

Dev process

The developer creates a shelveset after he is done with his changes. He ensures he fills up all the details including the reviewers email address ; separated
He runs the tool with a simple command
crmail shelvesetname
Email gets generated and opened he fills in additional information and fires send
Done!!

Reviewers

Ok they just click on the email links. Since mostly these are managers what more do you expect out of them? Real devs will stick with firing up tfpt command line :)

Configuring the tool

Download the binaries from here
Unzip. Open the crmail.exe.config file and modify the values in it to point to your tfsserver and your code review distribution list (if you do not have one then make it empty)
Checkin to some tools folder in your source control so that everyone in your team has access to it

Support

Self help is the best help :), download the sources from here and enjoy. Buck Hodges post on the direct link URLs would help you in case you want to modify the sources to do more.

Thursday, July 03, 2008

How does the .NET CF handle null reference

What happens when we have code as bellow

class B
{
    public virtual void Virt(){
        Console.WriteLine("Base::Virt");
    }
}

class Program
{
    static void Main(string[] args){
        B b = null;
        b.Virt(); // throws System.NullReferenceException
    }
}

Obviously we have a null reference exception being thrown. If you see the IL the call looks like

    L_0000: nop 
    L_0001: ldnull 
    L_0002: stloc.0 
    L_0003: ldloc.0 
    L_0004: callvirt instance void ConsoleApplication1.B::Virt()
    L_0009: nop 
    L_000a: ret

So in effect you'd expect the jitter to generate the following kind of code (in processor instruction)

if (b == null)
   throw new NullReferenceException
else
   b->Virt() // actually call safely using the this pointer

However, generating null checks for every call is going to lead to code bloat. So to work around this on some platforms (e.g. .NETCF on WinCE 6.0 and above) it uses the following approach

Hook up native access violation exception (WinCE 6.0 supports this) to a method in the execution engine (EE)
Do not generate any null checking and directly generate calls through references
In case the reference is null then a native AV (access violation is raised as invalid 0 address is accessed) and the hook method is called
At this point the EE checks to see if the source of the access violation (native code) is inside Jitted code block. If yes it creates the managed NullRefenceException and propagates it up the call chain.
If it's outside then obviously it's either CLR itself or some other native component is crashing and it has nothing to do about it..

Wednesday, July 02, 2008

C# generates virtual calls to non-virtual methods as well

Sometime back I had posted about a case where non-virtual calls are used for virtual methods and promised posting about the reverse scenario. This issue of C# generating callvirt IL instruction even for non-virtual method calls keeps coming back on C# discussion DLs every couple of months. So here it goes :)

Consider the following code

class B
{
    public virtual void Virt(){
        Console.WriteLine("Base::Virt");
    }

    public void Stat(){
        Console.WriteLine("Base::Stat");
    }
}

class D : B
{
    public override void Virt(){
        Console.WriteLine("Derived::Virt");
    }
}

class Program
{
    static void Main(string[] args)
    {
        D d = new D();
        d.Stat(); // should emit the call IL instruction
        d.Virt(); // should emit the callvirt IL instruction
    }
}

The basic scenario is that a base class defines a virtual method and a non-virtual method. A call is made to base using a derived class pointer. The expectation is that the call to the virtual method (B.Virt) will be through the intermediate language (IL) callvirt instruction and that to the non-virtual method (B.Stat) through call IL instruction.

However, this is not true and callvirt is used for both. If we open the disassembly for the Main method using reflector or ILDASM this is what we see

    L_0000: nop 
    L_0001: newobj instance void ConsoleApplication1.D::.ctor()
    L_0006: stloc.0 
    L_0007: ldloc.0 
    L_0008: callvirt instance void ConsoleApplication1.B::Stat()
    L_000d: nop 
    L_000e: ldloc.0 
    L_000f: callvirt instance void ConsoleApplication1.B::Virt()
    L_0014: nop 
    L_0015: ret

Question is why? There are two reasons that have been brought forward by the CLR team

API change.
The reason is that .NET team wanted a change in an method (API) from non-virtual to virtual to be non-breaking. So in effect since the call is anyway generated as callvirt a caller need not be recompiled in case the callee changes to be a virtual method.
Null checking
If a call is generated and the method body doesn't access any instance variable then it is possible to even call on null objects successfully. This is currently possible in C++, see a post I made on this here.
With callvirt there's a forced access to this pointer and hence the object on which the method is being called is automatically checked for null.

callvirt does come with additional performance cost but measurement showed that there's no significant performance difference between call with null check vs callvirt. Moreover, since the Jitter has full metadata of the callee, while jitting the callvirt it can generate processor instructions to do static call if it figures out that the callee is indeed non-virtual.

However, the compiler does try to optimize situations where it knows for sure that the target object cannot be null. E.g. for the expression i.ToString(); where i is an int call is used to call the ToString method because Int32 is value type (cannot be null) and sealed.

Thursday, June 26, 2008

Guy or a Girl

One interesting aspect of working in Internationally distributed team is that sometime it gets difficult to make common judgements. E.g. when we see a name we inherently figure out whether it's a male or female name and refer to that person as such in email. The issue is that I cannot always make the same judgement in case of names from another country/culture.

In my previous team in a long email thread someone continually referred to Khushboo as "he". Khushboo didn't correct him and it went on for some time until I pointed out that to him in a separate email. Today I was typing an email to someone and suddenly figured out I had no idea whether one of the person I'm referring to is male or female. I took a wild guess and I'm waiting to get corrected.

Wednesday, June 18, 2008

Baby smash

What is the common thing between every programmer dad/mom? The moment they get onto a new UI platform they write a child proofing application for the keyboard.

Scott Hanselman has just posted his version baby smash (via AmitChat). The funny thing is I've written one in WPF and so did my ex-manager.

Saturday, May 24, 2008

Stylecop has been released

Microsoft released the internal tool StyleCop to public under the fancy yet boring name of Microsoft Source Analysis for C#. Even though the name is boring the product is not.

You'll love this tool when it imposes consistent coding style across your team. You'll hate this tool when it imposes the same on you. The result is stunning looking, consistently styled code which your whole team can follow uniformly.

StyleCop has been in use for a long time internally in Microsoft and many teams mandate it's usage. My previous team VSTT used it as well. The only crib I had is that it didn't allow single line getters and setters (and our team didn't agree to disable this rule either).

// StyleCop didn't like this one
public int Foo
{
    get { return Foo; }
}

// StyleCop wanted this instead
public int Foo
{
    get
    {
        return Foo;
    }
}

Read more about using StyleCop here. You can set this up to be run as a part of your build process as documented here. Since this is plugged in as a MsBuild project you can use it in as a part of Team Foundation Build process as well.

Let the style wars begin in team meetings :)

Lambda the ultimate

Whatever I said about lambda before is crap. Each time time I use it I feel happy.

I had to fire a timer every so often this is what I can use with lambda...

dataStoreTimer = new Timer(new TimerCallback(
            (obj) => { (obj as AutoResetEvent).Set(); }), pollEvent, 100, 1000);

Sweet!!!

**BTW you can't blame me for having my shortest post just after the longest!!

Friday, May 23, 2008

Cell phone assault

Last two weeks my cell phone got assaulted thrice. First it was someone sending me a virus over bluetooth (a sis file actually). This happened when I was taking a photograph of my daughter with the cell phone camera in a restaurant (Aromas of China, City Center mall in Hyderabad).

The next one was bluetooth based advertisement messages in the Forum Mall in Bangalore. They were actually sending offers of the hour over bluetooth and I got 2 such messages.

The third incident was in the airport when someone was again trying to send me and make me open an trojan app.

I was really surprised with the rapid growth of cell phone based attacks. Worst is few people know of this. My wife had no idea that you can actually send applications over bluetooth and that can infect the phone.

Thursday, May 22, 2008

Building Scriptable Applications by hosting JScript

The kind of food I should have, but I don't

If you have played around with large applications, I'm sure you have been intrigued how they have been build to be extendable. The are multiple options

Develop your own extension mechanism where you pick up extension binaries and execute them.
One managed code example is here, where the application loads dlls (assemblies) from a folder and runs specific types from them. A similar unmanaged approach is allow registration of guids and use COM to load types that implement those interfaces
Roll out your own scripting mechanism:
One managed example is here where on the fly compilation is used. With DLR hosting mechanism coming up this will be very easy going forward
Support standard scripting mechanism:
This involves hosting JScript/VBScript inside the application and exposing a document object model (DOM) to it. So anyone can just write standard JScript to extend the application very much like how JScript in a webpage can extend/program the HTML DOM.

Obviously the 3rd is the best choice if you are developing a native (unmanaged) solution. The advantages are many because of low learning curve (any JScript programmer can write extensions), built in security, low-cost.

In this post I'll try to cover how you go about doing exactly that. I found little online documentation and took help of Kaushik from the JScript team to hack up some code to do this.

The Host Interface

To host JScript you need to implement the IActiveScriptSite. The code below shows how we do that stripping out the details we do not want to discuss here (no fear :) all the code is present in the download pointed at the end of the post). The code below is in the file ashost.h

class IActiveScriptHost : public IUnknown 
{
public:
    // IUnknown
    virtual ULONG __stdcall AddRef(void) = 0;
    virtual ULONG __stdcall Release(void) = 0;
    virtual HRESULT __stdcall QueryInterface(REFIID iid,
                                        void **obj) = 0;

    // IActiveScriptHost
    virtual HRESULT __stdcall Eval(const WCHAR *source, 
                                         VARIANT *result) = 0;
    virtual HRESULT __stdcall Inject(const WCHAR *name, 
                                         IUnknown *unkn) = 0;
};

class ScriptHost : 
    public IActiveScriptHost, 
    public IActiveScriptSite 
{
private:
    LONG _ref;
    IActiveScript *_activeScript;
    IActiveScriptParse *_activeScriptParse;

    ScriptHost(...){}

    virtual ~ScriptHost(){}
public:
    // IUnknown
    virtual ULONG __stdcall AddRef(void);
    virtual ULONG __stdcall Release(void);
    virtual HRESULT __stdcall QueryInterface(REFIID iid, void **obj);

    // IActiveScriptSite
    virtual HRESULT __stdcall GetLCID(LCID *lcid);
    virtual HRESULT __stdcall GetItemInfo(LPCOLESTR name,
        DWORD returnMask, IUnknown **item, ITypeInfo **typeInfo);
    virtual HRESULT __stdcall GetDocVersionString(BSTR *versionString);
    virtual HRESULT __stdcall OnScriptTerminate(const VARIANT *result,
        const EXCEPINFO *exceptionInfo);
    virtual HRESULT __stdcall OnStateChange(SCRIPTSTATE state);
    virtual HRESULT __stdcall OnEnterScript(void);
    virtual HRESULT __stdcall OnLeaveScript(void);
    virtual HRESULT __stdcall OnScriptError(IActiveScriptError *error);

    // IActiveScriptHost
    virtual HRESULT __stdcall Eval(const WCHAR *source,
                                           VARIANT *result);
    virtual HRESULT __stdcall Inject(const WCHAR *name, 
                                           IUnknown *unkn);
public:

    static HRESULT Create(IActiveScriptHost **host)
    {
        ...
    }

};

Here we are defining an interface IActiveScriptHost. ScriptHost implements the IActiveScriptHost and also the required hosting interface IActiveScriptSite. IActiveScriptHost exposes 2 extra methods (in green) that will be used from outside to easily host js scripts.

In addition ScriptHost also implements a factory method Create. This create method does the heavy lifting of using COM querying to get the various interfaces its needs (IActiveScript, IActiveScriptParse) and stores them inside the corresponding pointers.

Instantiating the host

So the client of this host class creates the ScriptHosting instance by using the following (see ScriptHostBase.cpp)

IActiveScriptHost *activeScriptHost = NULL;
HRESULT hr = S_OK;
HRESULT hrInit = S_OK;

hrInit = CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);
if(FAILED(hr)) throw L"Failed to initialize";

hr = ScriptHost::Create(&activeScriptHost);
if(FAILED(hr)) throw L"Failed to create ScriptHost";

With this the script host is available through activeScriptHost pointer and we already have JScript engine hosted in our application

Evaluating Scripts

Post hosting we need to make it do something interesting.This is where the IActiveScriptHost::Eval method comes in.

HRESULT __stdcall ScriptHost::Eval(const WCHAR *source, 
                                   VARIANT *result)
{
    assert(source != NULL);

    if (source == NULL)
        return E_POINTER;

    return _activeScriptParse->ParseScriptText(source, NULL, 
                                  NULL, NULL, 0, 1, 
                                  SCRIPTTEXT_ISEXPRESSION, 
                                  result, NULL);
}

Eval accepts a text of the script, makes it execute using IActiveScriptParse::ParseScriptText and returns the result.

So effectively we can accept input from the console and evaluate it (or read a file and interpret the complete script in it.

while (true) 
{
    wcout << L">> ";
    getline(wcin, input);
    if (quitStr.compare(input) == 0) break;

    if (FAILED(activeScriptHost->Eval(input.c_str(), &result)))
    {
        throw L"Script Error";
    }
    if (result.vt == 3)
        wcout << result.lVal << endl;
}

So all this is fine and at the end you can run the app (which BTW is a console app) and this what you can do.

JScript sample Host
q! to quit

>> Hello = 7
7
>> World = 6
6
>> Hello * World
42
>> q!
Press any key to continue . . .

So you have extended your app to do maths for you or rather run basic scripts which even though exciting but is not of much value.

Extending your app

Once we are past hosting the engine and running scripts inside the application we need to go ahead with actually building the application's DOM and injecting it into the hosting engine so that JScript can extend it.

If you already have a native application which is build on COM (IDispatch) then you have nothing more to do. But lets pretend that we actually have nothing and need to build the DOM.

To build the DOM you need to create IDispatch based DOM tree. There can be more than one roots. In this post I'm not trying to cover how to build IDispatch based COM objects (which you'd do using ATL or some such other means). However, for simplicity we will roll out a hand written implementation which implements an interface as below.

class IDomRoot : public IDispatch 
{
    // IUnknown
    virtual ULONG __stdcall AddRef(void) = 0;
    virtual ULONG __stdcall Release(void) = 0;
    virtual HRESULT __stdcall QueryInterface(REFIID iid, 
                                             void **obj) = 0;

    // IDispatch
    virtual HRESULT __stdcall GetTypeInfoCount( UINT *pctinfo) = 0;
    virtual HRESULT __stdcall GetTypeInfo( UINT iTInfo, LCID lcid,
                                           ITypeInfo **ppTInfo) = 0;
    virtual HRESULT __stdcall GetIDsOfNames( REFIID riid, 
                                      LPOLESTR *rgszNames,
                                      UINT cNames, LCID lcid,  
                                      DISPID *rgDispId) = 0;

    virtual HRESULT __stdcall Invoke( DISPID dispIdMember, REFIID riid, 
                                      LCID lcid, WORD wFlags, 
                                      DISPPARAMS *pDispParams, 
                                      VARIANT *pVarResult, 
                                      EXCEPINFO *pExcepInfo, 
                                      UINT *puArgErr) = 0;

    // IDomRoot
    virtual HRESULT __stdcall Print(BSTR str) = 0;
    virtual HRESULT __stdcall get_Val(LONG* pVal) = 0;
    virtual HRESULT __stdcall put_Val(LONG pVal) = 0;
};

At the top we have the standard IUnknown and IDispatch methods and at the end we have our DOM Root's methods (in blue). It implements a Print method that prints a string and a property called Val (with a set and get method for that property).

The class DomRoot implements this method and an additional method named Create which is the factory to create it. Once we are done with creating this we will inject this object inside the JScript scripting engine. So our final script host code looks as follows

IActiveScriptHost *activeScriptHost = NULL;
IDomRoot *domRoot = NULL;
HRESULT hr = S_OK;
HRESULT hrInit = S_OK;

hrInit = CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);
if(FAILED(hr)) throw L"Failed to initialize";

// Create the host
hr = ScriptHost::Create(&activeScriptHost);
if(FAILED(hr)) throw L"Failed to create ScriptHost";

// create the DOM Root
hr = DomRoot::Create(&domRoot);
if(FAILED(hr)) throw L"Failed to create DomRoot";

// Inject the created DOM Root into the scripting engine
activeScriptHost->Inject(L"DomRoot", (IUnknown*)domRoot);

What happens with the inject is as below

map rootList;
typedef map::iterator MapIter;
typedef pair InjectPair;

HRESULT __stdcall ScriptHost::Inject(const WCHAR *name, 
                                     IUnknown *unkn)
{
    assert(name != NULL);

    if (name == NULL)
        return E_POINTER;

    _activeScript->AddNamedItem(name, SCRIPTITEM_GLOBALMEMBERS | 
                                      SCRIPTITEM_ISVISIBLE );		
    rootList.insert(InjectPair(std::wstring(name), unkn));

    return S_OK;
}

&npsp;

In inject we store the name of the object and the corresponding IUnknown in a map (hash table). Each time the script will encounter a object in its code it calls GetItemInfo with that objects name and we then de-reference into the hash table and return the corresponding IUnknown

HRESULT __stdcall ScriptHost::GetItemInfo(LPCOLESTR name,
                                    DWORD returnMask,
                                    IUnknown **item,
                                    ITypeInfo **typeInfo)
{	
    MapIter iter = rootList.find(name);
    if (iter != rootList.end())
    {
        *item = (*iter).second;
        return S_OK;
    }
    else
        return E_NOTIMPL;
}

After that the script calls into that IDispatch to look for properties and methods and calls into them.

The Whole Flow

By now we have seen a whole bunch of code. Let's see how the whole thing works together. Let's assume we have a extension written in in JScript and it calls DomRoot.Val = 5; this is what happens to get the whole thing to work

During initialization we had created the DomRoot object (DomRoot::Create) which implements IDomRoot and injected it in the script engine via AddNamedItem and stored it at our end in a rootList map.
We call activeScriptHost->Eval(L"DomRoot.Val = 5;", ...) to evaluate the script. Evan calls _activeScriptParse->ParseScriptText.
When the script parse engine sees the "DomRoot" name it figures out that the name is a valid name added with AddNamedItem and hence it calls its hosts ScriptHost::GetItemInfo("DomRoot");
The host we have written looks up the same map filled during Inject and returns the IUnknown of it to the scripting engine. So at this point the scripting engine has a handle to our DOM root via an IUnknown to the DomRoot object
The scripting engine does a QueryInterface on that IUnknown to get the IDispatch interface from it
Then the engine calls the IDispatch::GetIDsOfNames with the name of the property "Val"
Our DomRoots implementation of GetIDsOfNames returns the required Dispatch ID of the Val property (which is 2 in our case)
The script engine calls IDispatch::Invoke with that dispatch id and a flag telling whether it wants the get or the set. In this case its set. Based on this the DomRoot re-directs the call to DomRoot::put_Val
With this we have a full flow of the host to script back to the DOM

In action

JScript sample Host
q! to quit

>> DomRoot.Val = 5;
5
>> DomRoot.Val = DomRoot.Val * 10
50
>> DomRoot.Val
50
>> DomRoot.Print("The answer is 42");
The answer is 42

Source Code

First of all the disclaimer. Let me get it off my chest by saying that the DomRoot code is a super simplified COM object. It commits nothing less than sacrilege. You shouldn't treat it as a sample code. I intentionally didn't do a full implementation so that you can step into it without the muck of IDispatchImpl or ATL coming into your way.

However, you can treat the script hosting part (ashost, ScriptHostBase) as sample code (that is the idea of the whole post :) )

The code organization is as follows

ashost.cpp, ashost.h - The Script host implementation
DomRoot.cpp, DomRoot.h - The DOM Root object injected into the scripting engine
ScriptHostBase.cpp - Driver

Note that in a real life example the driver should load jscript files from a given folder and execute it.

Download from here

Links

Search

Sunday, December 14, 2008

Thursday, November 13, 2008

Sunday, October 26, 2008

Wednesday, October 22, 2008

Tuesday, October 21, 2008

Monday, October 20, 2008

Thursday, October 16, 2008

Wednesday, October 15, 2008

Friday, September 26, 2008

Tuesday, September 16, 2008

Features

Limitations:

Download:

Sunday, September 14, 2008

Thursday, September 11, 2008

Tuesday, September 09, 2008

Tuesday, August 26, 2008

Source

Using the tool

Wednesday, August 20, 2008

Sunday, August 17, 2008

Sunday, July 20, 2008

Wednesday, July 09, 2008

Tuesday, July 08, 2008

Monday, July 07, 2008

Sunday, July 06, 2008

Features

Pre-reqs

Dev process

Reviewers

Configuring the tool

Support

Thursday, July 03, 2008

Wednesday, July 02, 2008

Thursday, June 26, 2008

Wednesday, June 18, 2008

Saturday, May 24, 2008

Friday, May 23, 2008

Thursday, May 22, 2008

The Host Interface

Instantiating the host

Evaluating Scripts

Extending your app

The Whole Flow

In action

Source Code