Search

Sunday, December 14, 2008

When the spell checkers attack

Frustrated?

Today Boing Boing had a post about the Cupertino effect in which spell-checkers erroneously change valid words. I had worked on Adobe Acrobat and Adobe FrameMaker's spell checking feature and do happen to appreciate the headache with getting them right.

I face Cupertino effect most often while typing in Indian names because most word-processors don't recognize them. For my wife's name Somtapa the offered spelling is Stomata followed by Sumatra. So I started calling her Stomata until it got onto her nerves and something very similar to the photo above happened to me

Thursday, November 13, 2008

Is interlocked increment followed by comparison thread safe?

Party canceled after heavy downpour

Sorry about the blog title, my imagination failed me :(.

In our internal alias someone asked the question "Is the following thread safe"

if(Interlocked.Increment(ref someInt) == CONSTANT_VAL)
{
doSomeStuff();
}

My instant reaction was no because even though the increment is done in a thread safe way using System.Threading.Interlocked class, the comparison that follows is not safe.


My reasoning was that the "if" expression can be broken down to the following operations



  1. Fetch of someInt
  2. Increment operation
  3. Write back of someInt
  4. Comparison

The first 3 are done inside the Increment method and it provides concurrency protection and hence cannot be interleaved by another instruction.


So if two threads are running in parallel (one marked in red and the other in green) I assumed that the following interleaving is possible



  1. someInt is 4 and CONSTANT_VAL is 5
  2. Fetch of someint      -> someInt ==4
  3. Increment operation   -> someInt ==5
  4. Write back of someint -> someInt ==5
  5. Fetch of someint      -> someInt == 5
  6. Increment operation   -> someInt == 6
  7. Write back of someint -> someInt == 6
  8. comparison            -> compare 6 & CONSTANT_VAL
  9. comparison            -> compare 6 & CONSTANT_VAL

This means that the comparison of both thread will fail.


However, someone responded back that I was wrong as the return value is being used and not the written back value. This made me do some more investigation.


If I see the JITted code then it looks like

if (Interlocked.Increment(ref someInt) == CONSTANT_VAL)
00000024 lea ecx,ds:[002B9314h]
0000002a call 796F1221

0000002f mov esi,eax
00000031 cmp esi,5

The call (at 0x000002a) is to the native code inside CLR which in turn calls Win32 api (InterlockedIncrement).


However, the last 2 lines are the interesting ones. Register EAX contains the return value and comparison is happening against that and CONSTANT_VAL. So even if the second thread had already changed the value of someInt it doesn’t have any effect as the return of the first increment is being used and not the safeInt value in memory. So first comparison (step 8 above) will actually compare CONSTANT_VAL against 5 and succeed.

Sunday, October 26, 2008

C/C++ Compile Time Asserts

Noodles

The Problem

Run time asserts are fairly commonly used in C++. As the MSDN documentation for assert states

"(assert) Evaluates an expression and, when the result is false, prints a diagnostic message and aborts the program."

There is another type of asserts which can be used to catch code issues right at the time of compilation. These are called static or compile-time asserts. These asserts can be used to do compile time validations and are very effectively used in the .NET Compact Framework code base.

E.g. you have two types Foo and Bar and your code assumes (may be for a reinterpret_cast) that they are of the same size. Now being in separate places there is always a possibility that someone modifies one without changing the other and that results in some weird bugs. How do you express this assumption in code? Obviously you can do a run-time check like

assert(sizeof(foo) == sizeof(bar));

If that code is not hit during running this assert will not get fired. This might be caught in testing later. However, if you notice carefully all of the information is available during compilation (both the type and the sizeof is resolved while compilation). So we should be able to do compile time validation, with-something to the effect

COMPILE_ASSERT(sizeof(int) == sizeof(char));

This should be tested during compilation and hence whether the code is run or not the assert should fail.


The Solution


There are many ways to get this done. I will discuss two quick ways


Array creation


You can create a MACRO expression as follows

#define COMPILE_ASSERT(x) extern int __dummy[(int)x]

This macro works as follows

// compiles fine 
COMPILE_ASSERT(sizeof(int) == sizeof(unsigned));
// error C2466: cannot allocate an array of constant size 0
COMPILE_ASSERT(sizeof(int) == sizeof(char));

The first expression gets expanded to int __dummy[1] and compiles fine, but the later expands to int __dummy[0] and fails to compile.


The advantage of this approach is that it works for both C and C++, however, the failure message is very confusing and doesn't indicate what the failure is for. It is left to the developer to visit the line of compilation failure to see that it's a COMPILE_ASSERT.


sizeof on incomplete type


This approach works using explicit-specialization of template types and the fact that sizeof of incomplete types fail to compile.


Consider the following

namespace static_assert
{
template <bool> struct STATIC_ASSERT_FAILURE;
template <> struct STATIC_ASSERT_FAILURE<true> { enum { value = 1 }; };
}

Here we defined the generic type STATIC_ASSERT_FAILURE. As you see the type is incomplete (no member definition). However we do provide a explicit-specialization of that type for the value true. However, the same for false is not provided. This means that sizeof(STATIC_ASSERT_FAILURE<true>) is valid but sizeof(STATIC_ASSERT_FAILURE<false>) is not. This can be used to create a compile time assert as follows

namespace static_assert
{
template <bool> struct STATIC_ASSERT_FAILURE;
template <> struct STATIC_ASSERT_FAILURE<true> { enum { value = 1 }; };

template<int x> struct static_assert_test{};
}

#define COMPILE_ASSERT(x) \
typedef ::static_assert::static_assert_test<\
sizeof(::static_assert::STATIC_ASSERT_FAILURE< (bool)( x ) >)>\
_static_assert_typedef_

Here the error we get is as follows

// compiles fine
COMPILE_ASSERT(sizeof(int) == sizeof(unsigned));
// error C2027: use of undefined type 'static_assert::STATIC_ASSERT_FAILURE<__formal>
COMPILE_ASSERT(sizeof(int) == sizeof(char));

So the advantage is that the STATIC_ASSERT_FAILURE is called out right at the point of failure and is more obvious to figure out


The macro expansion is as follows



  1. typedef static_assert_test< sizeof(STATIC_ASSERT_FAILURE<false>) >  _static_assert_typedef_
  2. typedef static_assert_test< sizeof(incomplete type) >  _static_assert_typedef_

Similarly for true the type is not incomplete and the expansion is



  1. typedef static_assert_test< sizeof(STATIC_ASSERT_FAILURE<true>) >  _static_assert_typedef_
  2. typedef static_assert_test< sizeof(valid type with one enum member) >  _static_assert_typedef_
  3. typedef static_assert_test< 1 > _static_assert_typedef_

Put it all together


All together the following source gives a good working point to create static or compile time assert that works for both C and C++

#ifdef __cplusplus

#define JOIN( X, Y ) JOIN2(X,Y)
#define JOIN2( X, Y ) X##Y

namespace static_assert
{
template <bool> struct STATIC_ASSERT_FAILURE;
template <> struct STATIC_ASSERT_FAILURE<true> { enum { value = 1 }; };

template<int x> struct static_assert_test{};
}

#define COMPILE_ASSERT(x) \
typedef ::static_assert::static_assert_test<\
sizeof(::static_assert::STATIC_ASSERT_FAILURE< (bool)( x ) >)>\
JOIN
(_static_assert_typedef, __LINE__)

#else // __cplusplus

#define COMPILE_ASSERT(x) extern int __dummy[(int)x]

#endif // __cplusplus

#define VERIFY_EXPLICIT_CAST(from, to) COMPILE_ASSERT(sizeof(from) == sizeof(to))

#endif // _COMPILE_ASSERT_H_

The only extra part is the JOIN macros. They just ensure that the typedef is using new names each time and doesn't give type already exists errors.


More Reading



  1. Boost static_assert has an even better implementation that takes cares of much more scenarios and compiler quirks. Head over to the source at http://www.boost.org/doc/libs/1_36_0/boost/static_assert.hpp
  2. Modern C++ Design by the famed Andrei Alexandrescu

Wednesday, October 22, 2008

Taking my job more seriously

Durga puja

I generally take my work very seriously. However, the following from SICP was still my favorite quote

"I think that it's extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don't think we are. I think we're responsible for stretching them, setting them off in new directions, and keeping fun in the house. I hope the field of computer science never loses its sense of fun. Above all, I hope we don't become missionaries. Don't feel as if you're Bible salesmen. The world has too many of those already. What you know about computing other people will learn. Don't feel as if the key to successful computing is only in your hands. What's in your hands, I think and hope, is intelligence: the ability to see the machine as more than when you were first led up to it, that you can make it more.''

However, couple of days back I was lying in a cold room with a doctor probing my Thyroid gland with ultrasound beams. She went on explaining the details of the tumor that I had, pointing to the screen. In my mind I was thinking about folks who have to code for these things and how careful they need to be.

The first thing I did when I came out of the room is search for whether .NET Compact Framework gets used for these kinds of equipment. Of the hundreds of pages I hit the one titled "Control System for Lung Ventilation Equipment with Windows CE , Microsoft .Net Compact Framework and Visual Studio Team System" struck me. Mostly because it had two products I personally coded for (NETCF and VSTS). There were even more life saving equipment listed on the search page.

It was a very humbling experience. I made a little vow, I'll be more careful when I code from tomorrow.

Tuesday, October 21, 2008

Silverlight on Nokia S60 devices

Frustrated?

In many of my blog posts (e.g.here and here) I refer to .NET Compact Framework and Symbian OS (S60) and obviously folks keep asking me via comments (or assume) that we are porting .NETCF on S60 devices. So I thought it's time to clarify :)

The short answer is that we are not porting .NETCF to S60 devices, but we are porting the Silverlight on S60 devices. This was jointly announced by Microsoft and Nokia, read Nokia's press release here.

We are in very early stage of development and it is very hard to tell how much will be in it (e.g. will it be SL v1.0 or v2.0). However, we are working hard and as and when more details emerge I will share it out from this blog. So keep reading.

Monday, October 20, 2008

Who halted the code

Fisher men

Today a bed time story involving un-initialized variable access and a weird coincidence.

Couple of weeks back I was kind of baffled with a weird issue I was facing with the .NET Compact Framework Jitter. We were making the Jitter work for Symbian OS (S60) emulator. The S60 emulator is a Windows process and hence we needed to bring up the i386 Jitter in our code base. After running some JITted code the application would stop executing with the message "Execution halted" in the stack trace.

Now obviously the user didn't halt anything and the program was a simple hello world.

After some debugging I found an interesting thing. For the debug build the Symbian C compiler memsets un-initialized variables to 0xcccc. This is fairly standard and is used to catch un-initialized variable access.

However, our Jit code had a bug such that in some scenario it was not emitting function call/returns correctly. So instead of returning we were executing arbitrary memory. But instead of some sort of access violation (or some thing equally bad) the execution was actually halting.

The reason was we started executing un-initialized data. Now this data is 0xcccc. The S60 debugger uses interrupt 3 (INT 3) for inserting breakpoints and the following line reveals the rest of the story

For i386 the instruction used for inserting breakpoint and the pattern used for un-initialized data matched and the debugger thought that it hit a breakpoint.

Thursday, October 16, 2008

Hex signatures

Tooth paste advertisement

Frequently in code we need to add bit patterns or magic numbers. Since hex numbers have the alphabets A-F folks become creative and create all kinds of interesting words. Typical being 0xDEAD, 0xBEEF, 0xFEED or various combination of these like 0xFEEDBEEF. However, someone actually used the following in a sample

0xbed1babe

No naming the guilty :)

Wednesday, October 15, 2008

Back To Basics: Finding your stack usage

Beads

Handling stack overflow is a critical requirement for most Virtual Machines. For .NET Compact framework (NETCF) it is more important because it runs on embedded devices with very little memory and hence little stack space.

The .NETCF Execution Engine (EE) needs to detect stack overflows and report it via StackOverflowException. To do this it uses some logic which we internally refer to as StackCheck (that will be covered in a later post). The algorithm needs fair prediction of stack usage for system APIs (e.g. Win32 APIs or for Symbian OS, S60 APIs).

Each time we target a new platform we do some measurements in addition to referring to specs :) to find it's stack characteristics. As we are currently making the NETCF work on Symbian OS we are doing these measurements again. So I thought of sharing how we are going about measuring stack usage using simple watermarking.

The technique

Step:1

On method Entry store the two current and max stack values. This typically available via system APIs which in case of Symbian is available over the TThreadStackInfo class (iBase, iLimit and other members).

                       +---------------+
| |
| |
| |
current stack ------> +---------------+
| |
| Available |
| Stack |
| |
| |
| |
| |
Stack limit ---------> +---------------+
| |
| |
. .
. .
. .

Step :2


Get a pointer on to the current stack pointer. How to get the pointer will vary based on the target platform. Options include system APIs, de-referencing stack pointer register (e.g. ESP register on x86) or simply creating a local variable on the stack and getting it's pointer.


Then Memset the whole region from current stack to the total available with some known pattern, e.g. 0xDEAD (a larger signature is a better approach to ensure there is no accidental match)

                       +---------------+
| |
| |
| |
current stack ------> +---------------+
| DEAD |
| DEAD |
| DEAD |
| DEAD |
| DEAD |
| DEAD |
| DEAD |
| DEAD |
Stack limit ---------> +---------------+
| |
| |
. .
. .
. .

Step: 3

Make an OS or whatever call you want to measure.

                       +---------------+
| |
| |
| |
current stack ------> +---------------+
| 1231 | --+
| 1231 | |
| D433 | +--> Stack usage
| D324 | |
| 3453 | --+
| DEAD |
| DEAD |
| DEAD |
Stack limit ---------> +---------------+
| |
| |
. .
. .
. .

Step 4:


When the call returns the stack will get modified. Iterate through the memory starting from the current stack pointer looking for the first occurrence of the pattern you’ve set in Step:2. This is the water mark and is the point till which the stack got used by the OS call. Subtract the water mark from the original entry point saved in Step 1 and you have the stack usage.

Friday, September 26, 2008

Tail call optimization

wall

I had posted about tail call and how .NET handles it before. However there was some email exchanges and confusion on internal DLs around it and so I thought I'd try to elaborate more on how .NET goes about handling tail calls

Let’s try to break this down a bit.

a) A piece of C# code contains tail recursion .

static void CountDown(int num)
{
Console.WriteLine("{0} ", num);
if (num > 0)
CountDown(--num);
}

-


b) The C# compiler does not optimize this and generates a normal recursive call in IL

 IL_000c:  ...
IL_000d:  ...
IL_000e:  sub
IL_000f:  dup
IL_0010:  starg.s num
IL_0012:  call void TailRecursion.Program::CountDown(int32)

-


c) The Jitter on seeing this call doesn’t optimize it in any way and just inserts a normal call instruction for the platform it targets (call for x86)


However, if any compiler is smart enough to add the tail. recursive IL statement then things change.


a) Scheme code

(define (CountDown n)
(if (= n 0)
n
(CountDown (- n 1))))

-


b) Hypothetical IronScheme compiler will generate (note for scheme it has to do the optimization)

 IL_000c:  ...
IL_000d:  ...
IL_000e:  sub
  tail.
IL_0023:  call void TailRecursion.Program::CountDown(int32)
IL_0029:  ret

-


c) Based on which JIT you are using and various other scenarios the JIT now may honour the tail. IL instruction and optimize this with a jmp when generating machine instruction


Please note the may in the very last point and refer to here for some instances where the optimization still might not take place…

Tuesday, September 16, 2008

A* Pathfinding algorithm animation screen saver

I'm trying to learn WPF and IMO it is not a simple task. Previously whenever I upgraded my UI technology knowledge it had been easy as it build on a lot of pre-existing concepts. However, WPF is more of a disruptive change.

I decided to write some simple fun application in the effort to learn WPF. Since I cannot write a BabySmash application as Scott Hanselman has already done a awesome job out of it, I decided to code up a simulation of A* path finding algorithm and push it out as a screen saver. The final product looks as follows.

AStar

What it does is that it shows a source, a destination with blockages (wall, mountain?) in between and then uses A* algorithm to find a path in between them. This is the same kind of algorithm that is used in say games like Age of the Empires as workers move around collecting wood and other resources. The algorithm is documented here.

Features

  1. It supports plugging in your own scenes with help of the awesome screen/board designer I blogged about
  2. Comes with a WPF and a console client to show the animation
  3. Algorithm can be played with easily to change or tweak it.
  4. Shows full animation including start, end, obstacle, closed cells, current path being probed and the final path
  5. Multi-screen support. Each screen shows a different board being solved.

Limitations:

Obviously this was more of a quick dirty hobby project and there remains a ton to work on. E.g.

  1. Screen saver preview doesn't work.
  2. Setting dialog is a sham.
  3. The boards do not flip for vertically aligned screens
  4. The XAML data binding is primitive and needs some work
  5. The path can choose diagonally across cell even if it seems to cross over diagonal wall of obstacle.
  6. Mouse move alone doesn't shut down the screen saver. You need to hit a
  7. Many more :)

Download:

Download the final binaries from here. Unzip to a folder, right click on the *.scr and choose Install

Download sources (including VS 2008 sln) from here.

Enjoy.

Sunday, September 14, 2008

How Many Types are loaded for Hello World

Fairy princess

Consider the following super simple C# code

namespace SmartDeviceProject1
{
class Program
{
static void Main(string[] args)
{
System.Console.WriteLine("Hello");
}
}
}

Can you guess how many managed Type gets loaded to run this? I was doing some profiling the .NET Compact Framework loader (for entirely unrelated reason) and was surprised by the list that got dumped. 87 types**, never could've guessed that...



  1. System.Object
  2. System.ValueType
  3. System.Type
  4. System.Reflection.MemberInfo
  5. System.Delegate
  6. System.MarshalByRefObject
  7. System.SystemException
  8. System.Exception
  9. System.Attribute
  10. System.Collections.Hashtable
  11. System.Enum
  12. System.Reflection.BindingFlags
  13. System.MulticastDelegate
  14. System.Reflection.MemberFilter
  15. System.Reflection.Binder
  16. System.Reflection.TypeAttributes
  17. System.DateTime
  18. System.Collections.IDictionary
  19. System.UnhandledExceptionEventHandler
  20. System.AppDomainManager
  21. System.Version
  22. System.Decimal
  23. System.Runtime.InteropServices.ComInterfaceType
  24. System.Collections.ICollection
  25. System.Collections.IEqualityComparer
  26. System.AppDomainManagerInitializationOptions
  27. System.ArithmeticException
  28. System.ArgumentException
  29. System.MissingMemberException
  30. System.MemberAccessException
  31. System.AppDomainSetup
  32. System.Reflection.AssemblyName
  33. System.Globalization.CultureInfo
  34. System.Reflection.Assembly
  35. System.Configuration.Assemblies.AssemblyHashAlgorithm
  36. System.Configuration.Assemblies.AssemblyVersionCompatibility
  37. System.Reflection.AssemblyNameFlags
  38. System.Globalization.CultureTableRecord
  39. System.Globalization.CompareInfo
  40. System.Globalization.TextInfo
  41. System.Globalization.NumberFormatInfo
  42. System.Globalization.DateTimeFormatInfo
  43. System.Globalization.Calendar
  44. System.Globalization.BaseInfoTable
  45. System.Globalization.CultureTable
  46. System.Globalization.CultureTableData
  47. System.Globalization.NumberStyles
  48. System.Globalization.DateTimeStyles
  49. System.Globalization.DateTimeFormatFlags
  50. System.Globalization.TokenHashValue
  51. System.Globalization.CultureTableHeader
  52. System.Globalization.CultureNameOffsetItem
  53. System.Globalization.RegionNameOffsetItem
  54. System.Globalization.IDOffsetItem
  55. System.TokenType
  56. System.Char
  57. System.IO.TextReader
  58. System.IO.TextWriter
  59. System.IFormatProvider
  60. System.Console
  61. System.RuntimeTypeHandle
  62. System.NotSupportedException
  63. System.Globalization.EndianessHeader
  64. System.Reflection.Missing
  65. System.RuntimeType
  66. System.Threading.StackCrawlMark
  67. System.Globalization.CultureTableItem
  68. System.Int32
  69. System.Security.CodeAccessSecurityEngine
  70. System.AppDomain
  71. System.LocalDataStoreMgr
  72. System.Threading.ExecutionContext
  73. System.LocalDataStore
  74. System.Collections.ArrayList
  75. System.Threading.SynchronizationContext
  76. System.Runtime.Remoting.Messaging.LogicalCallContext
  77. System.Runtime.Remoting.Messaging.IllogicalCallContext
  78. System.Threading.Thread
  79. System.Collections.Generic.Dictionary`2
  80. System.Runtime.Remoting.Messaging.CallContextRemotingData
  81. System.RuApplication starting
  82. System.Collections.Generic.IEqualityComparer`1
  83. Runtime.Remoting.Messaging.CallContextSecurityData
  84. System.Array
  85. System.RuntimeFieldHandle
  86. System.Globalization.CultureTableRecord[]
  87. System.Text.StringBuilder

**This is for the compact framework CLR. Your mileage will vary if you run the same on the desktop CLR.

Thursday, September 11, 2008

Designer for my path finding boards

I'm writing a small application (or rather a screen saver) that animates and demonstrates A* search algorithm. The idea is simple. On screen you see a start and end point and some random obstacles in between them. Then you see animation on how A* algorithm is used to navigate around the board to find a path (close to the shortest possible) between the two.

All of this is fairly standard. However, I got hit by a simple issue. I wanted to have the ability to design this board visually and also let my potential million users do the same. Obviously I don't have time to code up a designer and hence choose the all time manager's favorite technique. Re-use :)

So the final solution I took was to use Microsoft Office Excel as the WYSIWYG editor. I created a xlsx file with the following conditional formatting which colors cells based on the value of the cells.

Excel Conditional Formatting screen shot for blog

So in this case

  1. w = Wall marking the end of the table
  2. b = blocks/bricks/obstacle
  3. s = start
  4. e = end

Using this I can easily design the board. The designer in action looks as follows

Excel Designer For A* blog screenshot

Since the excel sheet has conditional formatting the user just types in s, e, b, w in the cells and they all light up visually. At the end he/she just saves the file using File => Save As and uses CSV format. This saves the file shown above as

w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w
w,,,,,,,,,,,,,,,,,,,,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,b,,,,,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,s,,,,,,,,b,b,b,b,b,b,,,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,,b,b,b,b,,b,b,b,b,,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,b,b,b,,,,,,b,b,b,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,b,b,,,,,,,,b,b,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,b,b,,,,,,,,b,b,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,b,b,,,,,,,b,b,,,,,b,,,,,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,b,b,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,b,b,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,b,b,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,b,b,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,b,,,,b,,,,,,,,,,,e,,,w
w,,,,,,,,,,,,,,,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,b,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,,,,,b,,,,,,,,,,,,,,w
w,,,,,,,,,,,,,,,,,,,,,,,,,,,b,,,,,,,,,,,,,,w
w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w,w

As you see the format is simply a row per line and every column separated by a comma. My simulator reads this file and renders using whatever renderer is configured (console or WPF).

More about the A* simulator to come soon...

Tuesday, September 09, 2008

It's is easy to claim that the world won't get destroyed today

Taken inside the Microsoft Campus Hyderabad

The clock is ticking and the Large Hadron Collider in Cern is going to get switched on today (9/10/2008). Even though there are speculations, CERN is claiming it's perfectly safe and the world won't end. But it's easy to claim that, who'll be around to prove them wrong in case they are :)

It's one of the coolest device at an operating temperature of < -270°C. But I'd get really angry if there's any disturbance to my Birthday celebrations!!

Tuesday, August 26, 2008

Team Foundation Server tool dump workspace details

Bangalore airport

I juggle around with a lot of workspaces. The reason is .NET Compact Framework is consumed in a whole bunch of things like Windows Mobile, Xbox, Zune, Nokia and most of them are on different source branches. On top of this active feature work happens in feature branches and there are other service branches to frequently peek into.

So the net effect is I need to sometime go to different folders and see what it's source control workspace mapping is or I need to dump details of workspaces on other computers and see if I can use that workspace as a template to create a new workspace.

The thing I hate here is to run cd to that folder and do a tf workspace to bring up the GUI and then dismiss the UI. I don't like GUI when things can be done equally well on the console. So I quickly hacked up a tool that dumps workspace details onto the console, something as below

d:\>workspace.exe NETCF\F01
Name: ABHINAB2_S60
Server: TfsServer01
Owner: Abhinaba
Computer: ABHINAB2

Comment:

Working folders:

Active $/NetCF/F01/tools d:\NETCF\F01\tools
Active $/NetCF/F01/source d:\NETCF\F01\source



-


This tool can be used either to see the mapping of a folder or given a workspace name and owner dump it details. It uses the exact same format as show in the tf workspace UI.


Source


The source and a build binary is linked below. It is a single source file (Program.cs) and you should be able to pull it into any version of Visual Studio and even build it from the command line using CSC



  1. Source
  2. Binary

Using the tool

Build it or use the pre-built binary and edit the .config file to point it to your TFS server and you are all set to go.

Wednesday, August 20, 2008

Obsession with desktop continues

Desktop

This is my new office setup. I have been pushing around a lot of code lately and felt I needed more real-estate to effectively do what I'm doing. So I hooked up another monitor. All  3 are standard HP LP1965 19" monitors.

However, since none of my video cards support 3 monitors and I have a whole bunch of computers to hook up I had to do some complex wiring around :). If I number the screens 1,2,3 from left to right this is how it works out

  • 1 and 2 is connected to my main dev box (machine1)
  • 2 and 3 is connected to the machine I use for emails and browsing (machine 2)
  • 2 actually goes through a 4-port KVM switch using which it can circle through machine1, machine2 and two other less used machines
  • 1 also goes through a 2 port KVM switch and connects a xbox and machine 1
  • I don't use the laptop at work directly, I connect to it using remote desktop.

Sweet :)

It's not as complex as it sounds. For my normal flow I just use the KVM to switch between machine-1 and machine-2. I rarely need to switch between the xbox and machine1.

PS: Don't try to make much sense of the Vintage books on the shelve, I plan to keep them for some more time before handing them off to some historian.

Sunday, August 17, 2008

Back to Basic: Using a System.Threading.Interlocked is a great idea

Bound to give you diabetes :)

I just saw some code which actually takes a lock to do a simple set operation. This is bad because taking locks are expensive and there is an easy alternative. The System.Threading.Interlocked class and its members can get the same job done and much faster.

I wrote the following two methods which increments a variable a million times. The first method does that by taking a lock and the other uses the Interlocked.Increment API.

static int IncrementorLock()
{
int val = 0;
object lockObject = new object();
for (int i = 0; i < 1000000; ++i)
{
lock (lockObject)
{
val++;
}
}

return val;
}

static int IncrementorInterlocked()
{
int val = 0;
for (int i = 0; i < 1000000; ++i)
{
System.Threading.
Interlocked.Increment(ref val);
}

return val;
}

-


I then used the Visual Studio Team System Profiler (instrumented profiling) and got the following performance data.















Function NameElapsed Inclusive TimeApplication Inclusive Time
IncrementorInterlocked()1,363.45134.43
IncrementorLock()4,374.23388.69


Even though this is a micro benchmark and uses a completely skewed scenario, it still drives the simple point that using the interlocked class is way faster.


The reason the interlocked version is faster is because instead of using .NET locks it directly calls Win32 apis like InterlockedIncrement which ensures atomic updation to variables at the OS scheduler level.


These Interlocked APIs complete depend on the support of the underlying OS. This is the reason you'd see that all the APIs are not available uniformly across all versions of .NET (e.g. there is no uniform coverage over WinCE and Xbox). While implementing these for the Nokia platform (S60) we are hitting some scenarios where there is no corresponding OS API.

Sunday, July 20, 2008

String equality

Flowers At the Botanical park

akutz has one of the most detailed post on string interning and equality comparison performance metrics I have ever seen. Head over to the post here

I loved his conclusion which is the crux of the whole story.

"In conclusion, the String class’s static Equals method is the most efficient way to compare two string literals and the String class’s instance Equals method is the most efficient way to compare two runtime strings. The kicker is that there must be 10^5 (100,000) string comparisons taking place before one method starts becoming more efficient than the next. Until then, use whichever method floats your boat."

Kismat Konnection

Flowers At the Botanical park

<sorry for the total off-topic />

I had a real hard time giving a title to this post.

The post was originally supposed to be about the fire I saw yesterday in a movie theatre Talkie Town (map). However it is not really about the fire, it is more about the people and how they reacted to it.

First things first, the fire.

Folks who know me, know that I go to a movie theatre once in couple of years, I prefer movies visiting me rather than the other way around. In the 4 years I'm in Hyderabad, this was my 3rd visit to a movie. So I can say it was a momentous occasion when me and my wife impromptu decided to see a movie in the 2 screen theatre in Miyapur called Talkie Town. The decision was heavily biased on the fact that my father-in-law was at home looking after our daughter.

Batman  (or rather The Dark Knight) lost to my wife and I was forced to see this bollywood flick Kismat Konnection.

This is where all the fun started. Just after the interval (1.5 hours through, and yes bollywood movies are that long) people saw some light on the theatre ceiling. Soon the light spread and there was a small hole in the ceiling and we could actually see flames. Whatever was above the sound proofed ceiling had caught fire and the heat actually caused the material of the ceiling to burn.

Then the most amazing thing happened, 50% people didn't care. They were gleefully looking up and not even caring to leave the theatre. I'm not talking about some false fire alarm, or smell of smoke, it was a real fire with flames and a burned hole in the ceiling!! The fire alarm didn't sound and no one from the theatre authority seemed to be around. I left and called some folks and they simply went to the terrace. All the noise now prompted some more people to leave but even at that time there were others happily watching as if the whole thing was a part of the movie.

Then even more amazing thing happened, the folks from the theatre came and said that the fire has been doused and re-started the show. The hall was smoky and the AC was off. Me and another guy (yea there were some more sane people around) caught hold of an official and he simply said he has checked and he can guarantee that everything is safe. I asked him that since he couldn't ensure that there is no fire in the first place how can he guarantee against a recurrence, he simply said I can get a refund of my ticket. Which I did and left the hall.

On the way back I had a revelation. From childhood I had seen that people around me had little care about safety in general, but this is the first time I figured out that people didn't even care about their own safety. How can a movie be worth taking the risk of sitting in a fire hazard zone and that too with small children on their lap is something I will never figure out.

A lot of things including the traffic situation and the weird jay walking I see around suddenly makes more sense to me.

Wednesday, July 09, 2008

Writing exception handlers as separate methods may prove to be a good idea

Flowers At the Botanical park

Let us consider a scenario where you catch some exception and in the exception handler do some costly operation. You can write that code in either of the following ways

Method-1 : Separate method call

public class Program
{
public static void Main(string[] args)
{
try
{
using (DataStore ds = new DataStore())
{
// ...
}
}
catch (Exception ex)
{
ErrorReporter(ex);
}
}

private static void ErrorReporter(Exception ex)
{
string path = System.IO.Path.GetTempFileName();
ErrorDumper ed = new ErrorDumper(path, ex);
ed.WriteError();

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(path);
RemoteErrorReporter er = new RemoteErrorReporter(xmlDoc);
er.ReportError();
}
}

-


Method-2 : Inline

public static void Main(string[] args)
{
try
{
using (DataStore ds = new DataStore())
{
// ...
}
}
catch (Exception ex)
{
string path = System.IO.Path.GetTempFileName();
ErrorDumper ed = new ErrorDumper(path, ex);
ed.WriteError();

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(path);
RemoteErrorReporter er = new RemoteErrorReporter(xmlDoc);
er.ReportError();
}
}

-


The simple difference is that in the first case the exception handler is written as a separate method and in the second case it is placed directly inline inside the handler itself.


The question is which is better in terms of performance?


In case you do have significant code and type reference in the handler and you expect the exception to be thrown rarely in an application execution then the Method-1 is going to be more performant.


The reason is that just before executing a method the whole method gets jitted. The jitted code contains stubs to the other method's it will call but it doesn't do a recursive jitting. This means when Main gets called it gets jitted but the method ErrorReporter is still not jitted. So in case the exception is never fired all the code inside ErrorReporter never gets Jitted. This might prove to be significant saving in terms of time and space if the handling code is complex and refers to type not already referenced.


However, if the code is inline then the moment Main gets jitted all the code inside the catch block gets jitted. This is expensive not only because it leads to Jitting of code that is never executed but also because all types referenced in the catch block is also resolved resulting in loading a bunch of dlls after searching though the disk. In our example above System.Xml.dll and the other dll containing remote error reporting gets loaded even though they will never be used. Since disk access, loading assemblies and type resolution are slow, the simple change can prove to give some saving.

Tuesday, July 08, 2008

Microsoft RoundTable

Our conference rooms have been fitted with this really weird looking device (click to enlarge).

I had no clue what the thing was. Fortunately it's box was still placed in the room along with the manual. It's called the Microsoft RoundTable and it is actually a 360-degree camera (with 5 cameras and 6 microphones). It comes with bundled software that let's all participant be visible to the other side in a live meeting at real time. It shows the white board and the software is intelligent enough to focus on and track the active speaker (using microphone and face recognition) and much much more (lot of MS Research stuff has gone into it). The video below gives you some idea and head on to this post for some review and inside view of the device.

Simply put it's AWSOME

Monday, July 07, 2008

Do namespace using directives affect Assembly Loading?

Hyderabad Microsoft Campus

The simple answer is no, the inquisitive reader can read on :)

Close to 2 year back I had posted about the two styles of coding using directives as follows

Style 1

namespace MyNameSpace
{
using System;
using System.Collections.Generic;
using System.Text;
// ...
}

-


Style 2

using System;
using System.Collections.Generic;
using System.Text;

namespace MyNameSpace
{
// ...
}

-


and outlined the benefits of the first style (using directives inside the namespace). This post is not to re-iterate them.


This post to figure out if either of the styles have any bearing in the loading order of assemblies. Obviously at the first look it clearly indicates that is shouldn't, but this has caused some back and forth discussions over the web.


Scot Hanselman posted about a statement on the Microsoft Style cop blog which states


"When using directives are declared outside of a namespace, the .Net Framework will load all assemblies referenced by these using statements at the same time that the referencing assembly is loaded.


However, placing the using statements within a namespace element allows the framework to lazy load the referenced assemblies at runtime. In some cases, if the referencing code is not actually executed, the framework can avoid having to load one or more of the referenced assemblies completely. This follows general best practice rule about lazy loading for performance.

Note, this is subject to change as the .Net Framework evolves, and there are subtle differences between the various versions of the framework."

This just doesn't sound right because using directives have no bearing to assembly loading.


Hanselman did a simple experiment with the following code

using System;  
using System.Xml;

namespace Microsoft.Sample
{
public class Program
{
public static void Main(string[] args)
{
Guid g = Guid.NewGuid();
Console.WriteLine("Before XML usage");
Console.ReadLine();
Foo();
Console.WriteLine("After XML usage");
Console.ReadLine();
}

public static void Foo()
{
XmlDocument x = new XmlDocument();
}
}
}

-


and then he watched the loading time using process explorer and then he moved the using inside the namespace and did the same. Both loaded the System.Xml.dll after he hit enter on the console clearly indicating that for both the cases they got lazy loaded.


Let me try to give a step by step rundown of how the whole type look up of XmlDocument happens in .NETCF which in turn would throw light on whether using directives have bearing on assembly loading.



  1. When Main method is Jitted and ran the System.Xml.dll is not yet loaded
  2. When method Foo is called the execution engine (referred to as EE) tries to JIT the method. As documented the Jitter only JITs methods that are to be executed.
  3. The Jitter tries to see if the method Foo is managed (could be native as well due to mixed mode support) and then tries to see if it's already Jitted (by a previous call), since it's not it goes ahead with jitting it
  4. The jitter validates a bunch of stuff like whether the class on which the method Foo is being called (in this case Microsoft.Sample.Program) is valid, been initialized, stack requirements, etc...
  5. Then it tries to resolve the local variables of the method. It waits to resolve the local variable type reference till this point so that it is able to save time and memory by not Jitting/loading types that are referenced by methods that are never executed
  6. Then it tries to resolve the type of the variable which in this case if System.Xml.XmlDocument.
  7. It sees if it's already in the cache, that is if that type is already loaded
  8. Since it's not the case it tries to search for the reference based on the type reference information
  9. This information contains the full type reference including the assembly name, which in this case is System.Xml.dll and also version information,strong name information, etc...
  10. All of the above information along with other information like the executing application's path is passed to the assembly loader to load the assembly
  11. The usual assembly search sequence is used to look for the assembly and then it is loaded and the type reference subsequently gets resolved

If you see the above steps there is in no way a dependency of assembly loading on using directive. Hence at least on .NETCF whether you put the using outside or inside the namespace you'd get the referenced assemblies loaded exactly at the time of first reference of a type from that assembly (the step #5 above is the key).

Sunday, July 06, 2008

Auto generating Code Review Email for TFS

Hyderabad Microsoft Campus

We use a small command line tool called crmail to auto-generate code review email from shelveset. I find the whole process very helpful and thought I'd share the process and the tool (which has some really cool features).

Features

  1. Automatic generation of the email from the shelveset details
  2. Hyperlinks are put to TFS webaccess so that you can review code from machines without any tools installed, even without source enlistment. Yes it's true!!! The only thing you need is your office's intranet access
  3. You can even use a Windows mobile phone :) and even some non MS browsers. Ok I guess I have sold this enuf
  4. This is how the email looks like with all the details pointed out
    crmail
  5. Effectively you can see the file diff, history, blame (annotate), shelveset details, associated bugs, everything from your browser and best thing is that all of these takes one click each.
    This is how the fill diff looks in the browser
    webdiff

Pre-reqs

  1. Team System Web Access (TSWA) 2008 power tool installed on your TFS server
  2. Outlook installed on the machine on which the email is generated
  3. Enlistment and TFS client installed on the machine on which the email is generated
  4. For reviewers there is no pre-req other than a browser and email reader.

Dev process

  1. The developer creates a shelveset after he is done with his changes. He ensures he fills up all the details including the reviewers email address ; separated
  2. He runs the tool with a simple command
    crmail shelvesetname
  3. Email gets generated and opened he fills in additional information and fires send
  4. Done!!

Reviewers

Ok they just click on the email links. Since mostly these are managers what more do you expect out of them? Real devs will stick with firing up tfpt command line :)

Configuring the tool

  1. Download the binaries from here
  2. Unzip. Open the crmail.exe.config file and modify the values in it to point to your tfsserver and your code review distribution list (if you do not have one then make it empty)
  3. Checkin to some tools folder in your source control so that everyone in your team has access to it

Support

Self help is the best help :), download the sources from here and enjoy. Buck Hodges post on the direct link URLs would help you in case you want to modify the sources to do more.

Thursday, July 03, 2008

How does the .NET CF handle null reference

Hyderabad Microsoft Campus

What happens when we have code as bellow

class B
{
public virtual void Virt(){
Console.WriteLine("Base::Virt");
}
}

class Program
{
static void Main(string[] args){
B b = null;
b.Virt(); // throws System.NullReferenceException
}
}

Obviously we have a null reference exception being thrown. If you see the IL the call looks like

    L_0000: nop 
L_0001: ldnull
L_0002: stloc.0
L_0003: ldloc.0
L_0004: callvirt instance void ConsoleApplication1.B::Virt()
L_0009: nop
L_000a: ret


 



So in effect you'd expect the jitter to generate the following kind of code (in processor instruction)



if (b == null)
throw new NullReferenceException
else
b->Virt() // actually call safely using the this pointer


 


However, generating null checks for every call is going to lead to code bloat. So to work around this on some platforms (e.g. .NETCF on WinCE 6.0 and above) it uses the following approach



  1. Hook up native access violation exception (WinCE 6.0 supports this) to a method in the execution engine (EE)
  2. Do not generate any null checking and directly generate calls through references
  3. In case the reference is null then a native AV (access violation is raised as invalid 0 address is accessed) and the hook method is called
  4. At this point the EE checks to see if the source of the access violation (native code) is inside Jitted code block. If yes it creates the managed NullRefenceException and propagates it up the call chain.
  5. If it's outside then obviously it's either CLR itself or some other native component is crashing and it has nothing to do about it..

Wednesday, July 02, 2008

C# generates virtual calls to non-virtual methods as well

Hyderabad Microsoft Campus

Sometime back I had posted about a case where non-virtual calls are used for virtual methods and promised posting about the reverse scenario. This issue of C# generating callvirt IL instruction even for non-virtual method calls keeps coming back on C# discussion DLs every couple of months. So here it goes :)

Consider the following code

class B
{
public virtual void Virt(){
Console.WriteLine("Base::Virt");
}

public void Stat(){
Console.WriteLine("Base::Stat");
}
}

class D : B
{
public override void Virt(){
Console.WriteLine("Derived::Virt");
}
}

class Program
{
static void Main(string[] args)
{
D d = new D();
d.Stat(); // should emit the call IL instruction
d.Virt(); // should emit the callvirt IL instruction
}
}

The basic scenario is that a base class defines a virtual method and a non-virtual method. A call is made to base using a derived class pointer. The expectation is that the call to the virtual method (B.Virt) will be through the intermediate language (IL) callvirt instruction and that to the non-virtual method (B.Stat) through call IL instruction.


However, this is not true and callvirt is used for both. If we open the disassembly for the Main method using reflector or ILDASM this is what we see

    L_0000: nop 
L_0001: newobj instance void ConsoleApplication1.D::.ctor()
L_0006: stloc.0
L_0007: ldloc.0
L_0008: callvirt instance void ConsoleApplication1.B::Stat()
L_000d: nop
L_000e: ldloc.0
L_000f: callvirt instance void ConsoleApplication1.B::Virt()
L_0014: nop
L_0015: ret

Question is why? There are two reasons that have been brought forward by the CLR team



  1. API change.
    The reason is that .NET team wanted a change in an method (API) from non-virtual to virtual to be non-breaking. So in effect since the call is anyway generated as callvirt a caller need not be recompiled in case the callee changes to be a virtual method.

  2. Null checking
    If a call is generated and the method body doesn't access any instance variable then it is possible to even call on null objects successfully. This is currently possible in C++, see a post I made on this here.

    With callvirt there's a forced access to this pointer and hence the object on which the method is being called is automatically checked for null.


callvirt does come with additional performance cost but measurement showed that there's no significant performance difference between call with null check vs callvirt. Moreover, since the Jitter has full metadata of the callee, while jitting the callvirt it can generate processor instructions to do static call if it figures out that the callee is indeed non-virtual.


However, the compiler does try to optimize situations where it knows for sure that the target object cannot be null. E.g. for the expression i.ToString(); where i is an int call is used to call the ToString method because Int32 is value type (cannot be null) and sealed.

Thursday, June 26, 2008

Guy or a Girl

Hyderabad Microsoft Campus

One interesting aspect of working in Internationally distributed team is that sometime it gets difficult to make common judgements. E.g. when we see a name we inherently figure out whether it's a male or female name and refer to that person as such in email. The issue is that I cannot always make the same judgement in case of names from another country/culture.

In my previous team in a long email thread someone continually referred to Khushboo as "he". Khushboo didn't correct him and it went on for some time until I pointed out that to him in a separate email. Today I was typing an email to someone and suddenly figured out I had no idea whether one of the person I'm referring to is male or female. I took a wild guess and I'm waiting to get corrected.

Wednesday, June 18, 2008

Baby smash

Waiting in the Microsoft lobby

What is the common thing between every programmer dad/mom? The moment they get onto a new UI platform they write a child proofing application for the keyboard.

Scott Hanselman has just posted his version baby smash (via AmitChat). The funny thing is I've written one in WPF and so did my ex-manager.

Saturday, May 24, 2008

Stylecop has been released

me

Microsoft released the internal tool StyleCop to public under the fancy yet boring name of Microsoft Source Analysis for C#. Even though the name is boring the product is not.

You'll love this tool when it imposes consistent coding style across your team. You'll hate this tool when it imposes the same on you. The result is stunning looking, consistently styled code which your whole team can follow uniformly.

StyleCop has been in use for a long time internally in Microsoft and many teams mandate it's usage. My previous team VSTT used it as well. The only crib I had is that it didn't allow single line getters and setters (and our team didn't agree to disable this rule either).

// StyleCop didn't like this one
public int Foo
{
get { return Foo; }
}

// StyleCop wanted this instead
public int Foo
{
get
{
return Foo;
}
}


 


Read more about using StyleCop here. You can set this up to be run as a part of your build process as documented here. Since this is plugged in as a MsBuild project you can use it in as a part of Team Foundation Build process as well.


Let the style wars begin in team meetings :)

Lambda the ultimate

GunGun

Whatever I said about lambda before is crap. Each time time I use it I feel happy.

I had to fire a timer every so often this is what I can use with lambda...

dataStoreTimer = new Timer(new TimerCallback(
(obj) => { (obj as AutoResetEvent).Set(); }), pollEvent, 100, 1000);

Sweet!!!


**BTW you can't blame me for having my shortest post just after the longest!!

Friday, May 23, 2008

Cell phone assault

Visakhapatnam - Ramakrishna beach

Last two weeks my cell phone got assaulted thrice. First it was someone sending me a virus over bluetooth (a sis file actually). This happened when I was taking a photograph of my daughter with the cell phone camera in a restaurant (Aromas of China, City Center mall in Hyderabad).

The next one was bluetooth based advertisement messages in the Forum Mall in Bangalore. They were actually sending offers of the hour over bluetooth and I got 2 such messages.

The third incident was in the airport when someone was again trying to send me and make me open an trojan app.

I was really surprised with the rapid growth of cell phone based attacks. Worst is few people know of this. My wife had no idea that you can actually send applications over bluetooth and that can infect the phone.

Thursday, May 22, 2008

Building Scriptable Applications by hosting JScript

The kind of food I should have, but I don't

If you have played around with large applications, I'm sure you have been intrigued how they have been build to be extendable. The are multiple options

  1. Develop your own extension mechanism where you pick up extension binaries and execute them.
    One managed code example is here, where the application loads dlls (assemblies) from a folder and runs specific types from them. A similar unmanaged approach is allow registration of guids and use COM to load types that implement those interfaces
  2. Roll out your own scripting mechanism:
    One managed example is here where on the fly compilation is used. With DLR hosting mechanism coming up this will be very easy going forward
  3. Support standard scripting mechanism:
    This involves hosting JScript/VBScript inside the application and exposing a document object model (DOM) to it. So anyone can just write standard JScript to extend the application very much like how JScript in a webpage can extend/program the HTML DOM.

Obviously the 3rd is the best choice if you are developing a native (unmanaged) solution. The advantages are many because of low learning curve (any JScript programmer can write extensions), built in security, low-cost.

In this post I'll try to cover how you go about doing exactly that. I found little online documentation and took help of Kaushik from the JScript team to hack up some code to do this.

The Host Interface

To host JScript you need to implement the IActiveScriptSite. The code below shows how we do that stripping out the details we do not want to discuss here (no fear :) all the code is present in the download pointed at the end of the post). The code below is in the file ashost.h

class IActiveScriptHost : public IUnknown 
{
public:
// IUnknown
virtual ULONG __stdcall AddRef(void) = 0;
virtual ULONG __stdcall Release(void) = 0;
virtual HRESULT __stdcall QueryInterface(REFIID iid,
void **obj) = 0;

// IActiveScriptHost
virtual HRESULT __stdcall Eval(const WCHAR *source,
VARIANT *result) = 0;
virtual HRESULT __stdcall Inject(const WCHAR *name,
IUnknown *unkn) = 0;

};

class ScriptHost :
public IActiveScriptHost,
public IActiveScriptSite
{
private:
LONG _ref;
IActiveScript *_activeScript;
IActiveScriptParse *_activeScriptParse;

ScriptHost(...){}

virtual ~ScriptHost(){}
public:
// IUnknown
virtual ULONG __stdcall AddRef(void);
virtual ULONG __stdcall Release(void);
virtual HRESULT __stdcall QueryInterface(REFIID iid, void **obj);

// IActiveScriptSite
virtual HRESULT __stdcall GetLCID(LCID *lcid);
virtual HRESULT __stdcall GetItemInfo(LPCOLESTR name,
DWORD returnMask, IUnknown **item, ITypeInfo **typeInfo);

virtual HRESULT __stdcall GetDocVersionString(BSTR *versionString);
virtual HRESULT __stdcall OnScriptTerminate(const VARIANT *result,
const EXCEPINFO *exceptionInfo);
virtual HRESULT __stdcall OnStateChange(SCRIPTSTATE state);
virtual HRESULT __stdcall OnEnterScript(void);
virtual HRESULT __stdcall OnLeaveScript(void);
virtual HRESULT __stdcall OnScriptError(IActiveScriptError *error);

// IActiveScriptHost
virtual HRESULT __stdcall Eval(const WCHAR *source,
VARIANT *result);
virtual HRESULT __stdcall Inject(const WCHAR *name,
IUnknown *unkn);

public:

static HRESULT Create(IActiveScriptHost **host)
{
...
}


};

Here we are defining an interface IActiveScriptHost. ScriptHost implements the IActiveScriptHost and also the required hosting interface IActiveScriptSite. IActiveScriptHost exposes 2 extra methods (in green) that will be used from outside to easily host js scripts.


In addition ScriptHost also implements a factory method Create. This create method does the heavy lifting of using COM querying to get the various interfaces its needs (IActiveScript, IActiveScriptParse) and stores them inside the corresponding pointers.


Instantiating the host


So the client of this host class creates the ScriptHosting instance by using the following (see ScriptHostBase.cpp)

IActiveScriptHost *activeScriptHost = NULL;
HRESULT hr = S_OK;
HRESULT hrInit = S_OK;

hrInit = CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);
if(FAILED(hr)) throw L"Failed to initialize";

hr = ScriptHost::Create(&activeScriptHost);
if(FAILED(hr)) throw L"Failed to create ScriptHost";


 


With this the script host is available through activeScriptHost pointer and we already have JScript engine hosted in our application


Evaluating Scripts


Post hosting we need to make it do something interesting.This is where the IActiveScriptHost::Eval method comes in.

HRESULT __stdcall ScriptHost::Eval(const WCHAR *source, 
VARIANT *result)
{
assert(source != NULL);

if (source == NULL)
return E_POINTER;

return _activeScriptParse->ParseScriptText(source, NULL,
NULL, NULL, 0, 1,
SCRIPTTEXT_ISEXPRESSION,
result, NULL);
}

Eval accepts a text of the script, makes it execute using IActiveScriptParse::ParseScriptText and returns the result.


So effectively we can accept input from the console and evaluate it (or read a file and interpret the complete script in it.

while (true) 
{
wcout << L">> ";
getline(wcin, input);
if (quitStr.compare(input) == 0) break;

if (FAILED(activeScriptHost->Eval(input.c_str(), &result)))
{
throw L"Script Error";
}
if (result.vt == 3)
wcout << result.lVal << endl;
}

So all this is fine and at the end you can run the app (which BTW is a console app) and this what you can do.

JScript sample Host
q! to quit

>> Hello = 7
7
>> World = 6
6
>> Hello * World
42
>> q!
Press any key to continue . . .


So you have extended your app to do maths for you or rather run basic scripts which even though exciting but is not of much value.


Extending your app


Once we are past hosting the engine and running scripts inside the application we need to go ahead with actually building the application's DOM and injecting it into the hosting engine so that JScript can extend it.


If you already have a native application which is build on COM (IDispatch) then you have nothing more to do. But lets pretend that we actually have nothing and need to build the DOM.


To build the DOM you need to create IDispatch based DOM tree. There can be more than one roots. In this post I'm not trying to cover how to build IDispatch based COM objects (which you'd do using ATL or some such other means). However, for simplicity we will roll out a hand written implementation which implements an interface as below.

class IDomRoot : public IDispatch 
{
// IUnknown
virtual ULONG __stdcall AddRef(void) = 0;
virtual ULONG __stdcall Release(void) = 0;
virtual HRESULT __stdcall QueryInterface(REFIID iid,
void **obj) = 0;

// IDispatch
virtual HRESULT __stdcall GetTypeInfoCount( UINT *pctinfo) = 0;
virtual HRESULT __stdcall GetTypeInfo( UINT iTInfo, LCID lcid,
ITypeInfo **ppTInfo) = 0;
virtual HRESULT __stdcall GetIDsOfNames( REFIID riid,
LPOLESTR *rgszNames,
UINT cNames, LCID lcid,
DISPID *rgDispId) = 0;

virtual HRESULT __stdcall Invoke( DISPID dispIdMember, REFIID riid,
LCID lcid, WORD wFlags,
DISPPARAMS *pDispParams,
VARIANT *pVarResult,
EXCEPINFO *pExcepInfo,
UINT *puArgErr) = 0;

// IDomRoot
virtual HRESULT __stdcall Print(BSTR str) = 0;
virtual HRESULT __stdcall get_Val(LONG* pVal) = 0;
virtual HRESULT __stdcall put_Val(LONG pVal) = 0;

};


 


At the top we have the standard IUnknown and IDispatch methods and at the end we have our DOM Root's methods (in blue). It implements a Print method that prints a string and a property called Val (with a set and get method for that property).


The class DomRoot implements this method and an additional method named Create which is the factory to create it. Once we are done with creating this we will inject this object inside the JScript scripting engine. So our final script host code looks as follows

IActiveScriptHost *activeScriptHost = NULL;
IDomRoot *domRoot = NULL;
HRESULT hr = S_OK;
HRESULT hrInit = S_OK;

hrInit = CoInitializeEx(NULL, COINIT_APARTMENTTHREADED);
if(FAILED(hr)) throw L"Failed to initialize";

// Create the host
hr = ScriptHost::Create(&activeScriptHost);
if(FAILED(hr)) throw L"Failed to create ScriptHost";

// create the DOM Root
hr = DomRoot::Create(&domRoot);
if(FAILED(hr)) throw L"Failed to create DomRoot";

// Inject the created DOM Root into the scripting engine
activeScriptHost->Inject(L"DomRoot", (IUnknown*)domRoot);

What happens with the inject is as below

map rootList;
typedef map::iterator MapIter;
typedef pair InjectPair;

HRESULT __stdcall ScriptHost::Inject(const WCHAR *name,
IUnknown *unkn)
{
assert(name != NULL);

if (name == NULL)
return E_POINTER;

_activeScript->AddNamedItem(name, SCRIPTITEM_GLOBALMEMBERS |
SCRIPTITEM_ISVISIBLE );
rootList.insert(InjectPair(std::wstring(name), unkn));

return S_OK;
}


&npsp;


In inject we store the name of the object and the corresponding IUnknown in a map (hash table). Each time the script will encounter a object in its code it calls GetItemInfo with that objects name and we then de-reference into the hash table and return the corresponding IUnknown

HRESULT __stdcall ScriptHost::GetItemInfo(LPCOLESTR name,
DWORD returnMask,
IUnknown **item,
ITypeInfo **typeInfo)
{
MapIter iter = rootList.find(name);
if (iter != rootList.end())
{
*item = (*iter).second;
return S_OK;
}
else
return E_NOTIMPL;
}

After that the script calls into that IDispatch to look for properties and methods and calls into them.


The Whole Flow


By now we have seen a whole bunch of code. Let's see how the whole thing works together. Let's assume we have a extension written in in JScript and it calls DomRoot.Val = 5; this is what happens to get the whole thing to work



  1. During initialization we had created the DomRoot object (DomRoot::Create) which implements IDomRoot and injected it in the script engine via AddNamedItem and stored it at our end in a rootList map.
  2. We call activeScriptHost->Eval(L"DomRoot.Val = 5;", ...) to evaluate the script. Evan calls _activeScriptParse->ParseScriptText.
  3. When the script parse engine sees the "DomRoot" name it figures out that the name is a valid name added with AddNamedItem and hence it calls its hosts ScriptHost::GetItemInfo("DomRoot");
  4. The host we have written looks up the same map filled during Inject and returns the IUnknown of it to the scripting engine. So at this point the scripting engine has a handle to our DOM root via an IUnknown to the DomRoot object
  5. The scripting engine does a QueryInterface on that IUnknown to get the IDispatch interface from it
  6. Then the engine calls the IDispatch::GetIDsOfNames with the name of the property "Val"
  7. Our DomRoots implementation of GetIDsOfNames returns the required Dispatch ID of the Val property (which is 2 in our case)
  8. The script engine calls IDispatch::Invoke with that dispatch id and a flag telling whether it wants the get or the set. In this case its set. Based on this the DomRoot re-directs the call to DomRoot::put_Val
  9. With this we have a full flow of the host to script back to the DOM

In action

JScript sample Host
q! to quit

>> DomRoot.Val = 5;
5
>> DomRoot.Val = DomRoot.Val * 10
50
>> DomRoot.Val
50
>> DomRoot.Print("The answer is 42");
The answer is 42

 


Source Code


First of all the disclaimer. Let me get it off my chest by saying that the DomRoot code is a super simplified COM object. It commits nothing less than sacrilege. You shouldn't treat it as a sample code. I intentionally didn't do a full implementation so that you can step into it without the muck of IDispatchImpl or ATL coming into your way.


However, you can treat the script hosting part (ashost, ScriptHostBase) as sample code (that is the idea of the whole post :) )


The code organization is as follows


ashost.cpp, ashost.h - The Script host implementation
DomRoot.cpp, DomRoot.h - The DOM Root object injected into the scripting engine
ScriptHostBase.cpp - Driver


Note that in a real life example the driver should load jscript files from a given folder and execute it.


Download from here