Monday, December 28, 2009

Back to Basics: Memory leaks in managed systems

Kolkata Trip 2009

Someone contacted me over my blog about his managed application where the working set goes on increasing and ultimately leads to out of memory. In the email at one point he states that he is using .NET and hence there should surely be no leaks. I have also talked with other folks in the past where they think likewise.

However, this is not really true. To get to the bottom of this first we need to understand what the the GC does. Do read up http://blogs.msdn.com/abhinaba/archive/2009/01/20/back-to-basics-why-use-garbage-collection.aspx.

In summary GC keeps track of object usage and collects/removes those that are no longer referenced/used by any other objects. It ensures that it doesn’t leave dangling pointers. You can find how it does this at http://blogs.msdn.com/abhinaba/archive/2009/01/25/back-to-basic-series-on-dynamic-memory-management.aspx

However, there is some catch to the statement above. The GC can only remove objects that are not in use. Unfortunately it’s easy to get into a situation where your code can result in objects never being completely de-referenced.

Example 1: Event Handling (discussed in more detail here).

Consider the following code

EventSink sink = new EventSink();
EventSource src = new EventSource();

src.SomeEvent += sink.EventHandler;
src.DoWork();

sink = null;
// Force collection
GC.Collect();
GC.WaitForPendingFinalizers();

In this example at the point where we are forcing a GC there is no reference to sink (explicitly via sink = null ), however, even then sink will not be collected. The reason is that sink is being used as an event handler and hence src is holding an reference to sink (so that it can callback into sink.EventHandler once the src.SomeEvent is fired) and stopping it from getting collected


Example 2: Mutating objects in collection (discussed here)


There can be even more involved cases. Around 2 years back I saw an issue where objects were being placed inside a Dictionary and later retrieved, used and discarded. Now retrieval was done using the object key. The flow was something like



  1. Create Object and put it in a Dictionary
  2. Later get object using object key
  3. Call some functions on the object
  4. Again get the object by key and remove it

Now the object was not immutable and in using the object in step 3 some fields of that object got modified and the same field was used for calculating the objects hash code (used in overloaded GetHashCode). This meant the Remove call in step 4 didn’t find the object and it remained inside the dictionary. Can you guess why changing a field of an object that is used in GetHashCode fails it from being retrieved from the dictionary? Check out http://blogs.msdn.com/abhinaba/archive/2007/10/18/mutable-objects-and-hashcode.aspx to know why this happens.


There are many more examples where this can happen.


So we can conclude that memory leaks is common in managed memory as well but it typically happens a bit differently where some references are not cleared as they should’ve been and the GC finds these objects referenced by others and does not collect them.

Monday, December 21, 2009

Some things I have learnt amount SW development

Kolkata Trip 2009

Working in the developer division is very exciting because I can relate so well with the customers. However, that is also an issue. You relate so well that you tend to evolve some strong opinions that can cloud your view. While working in the Visual Studio Team System team and now in the .NET Compact Framework team I have learnt some lessons. I thought I’d share some of them

I am not *the* customer (or rather the only customer)
Even though I represent the customer in different avatars (sometimes as a developer, sometimes as an office worker, sometimes just as a geek) I am not THE only customer that the product targets. When your product is going to be used by 100,000 developers/testers the average or even the predominant usage is going to be very different from what you think it will be. Sitting though an usability study is a very humbling experience. Like I believed that everyone should just be using incremental search.

Consistency is important
Geeks and bloggers tend to over-tout coolness. While cool user experience (UX) seems awesome, it is frequently overdone and what seems cool on casual usage tends to tire soon. Consistency on the other hand lets your users get on-board faster and lets them spend time doing stuff they care about and not learning things that should work automatically.

Consistency doesn’t mean that every application needs to look exactly like notepad. Expression Blend is a great example which looks refreshingly cool (appeals to the designers) and at the same time provides an experience that is consistent to other windows apps.

Learn to let go
”If there’s not something you can actively do on a project, if it’s something you can only influence and observe, then you have to learn to provide your feedback, and then let go and allow other people to fail… People don’t learn if they never make any mistakes” ~ Raymond Chen on Microspotting

Corporate software development is very different from Indie development. Large software development projects have a bunch of people/teams involved. It is not necessary that the collective opinion matches yours. At some point you need to learn to let go and do what is required. As an example I can debate on what a Debug Assert dialog should look like or do. However, there are other folks to design and think about the UX, as a developer I need to give inputs and once the call has been made it’s my job to provide the best engineering solution that implements that UX**.

 

** Do note the debug assert dialog is a fictitious example, I never worked on the IDE side.

Sunday, December 20, 2009

Indic Language Input

Diwali 2009

If you have tried inputting Indian languages in Windows you know it’s a major pain. That is particularly sad because Windows comes with very good support of Indian languages. I had almost given up using my native language Bengali on a computer due to this. Even when I was creating the About Page for this blog and wanted to have a version in Bengali, I had to cut it short a lot because typing it out was so painful.

There are web-based tools like the Google Transliteration tool that works well for entering text into web-pages where they are integrated (e.g. Orkut). However, I wanted a solution that pans the desktop, so that I can use it for say writing a post using Windows Live Writer.

Enter the Microsoft Indic Language Input tool. Head over to the link and install the desktop version. You can install the various languages individually (currently Bengali, Hindi, Kannada, Malayalam, Tamil and Telugu is supported). I personally installed the Bengali and Hindi versions.

Since I am on Windows 7 which comes pre-installed with Complex language support I needn’t do anything special. However, on older OS like XP you need to do some extra steps which are available through the Getting Started link on that page.

Once you are setup you can keep the Windows Language Bar floating on the desktop. The tool extends the language bar to allow you to enter Indic languages using an English keyboard via transliteration.

image

Go to the application where you want to enter Indic language and then switch to Bengali (or any of the other 6 supported Indic language) using this language bar. Start typing বেঙ্গলি using English keyboard and the tool will transliterates. The moment you’d hit a word terminator like space it inserts the Bengali word.

image

I tried some difficult words like কিংকর্তববিমূঢ় and it worked amazingly well

image

I had a very good experience with the tool. The only issue I faced was that the tool was extremely slow with some WPF apps like Seesmic twitter client. However, I got to know from the dev team that they are aware of the issue (it’s for some specific WPF apps and not WPF in general). I hope they fix it before they RTM (the tool is in Beta).

Tip: You can hit alt+shift to cycle the various languages in the toolbar without having to use your mouse (which is handy if you typing using a mix of languages).

Friday, December 04, 2009

NETCF: Count your bytes correctly

Pirate

I got a stress related issue reported in which the code tried allocating a 5MB array and do some processing on it but that failed with OOM (Out of Memory). It also stated that there was way more than 5 MB available on the device and surely it’s some bug in the Execution Engine.

Here’s the code.

try{
byte[] result;
long GlobalFileSize = 5242660; //5MB
result = new byte[GlobalFileSize];
string payload = Encoding.UTF8.GetString(result, 0, result.Length);
System.Console.WriteLine("len " + payload.Length);
}
catch (Exception e)
{
System.Console.WriteLine("Exception " + e);
}

The problem is that the user didn’t count his bytes well. The array is 5MB and it actually gets allocated correctly. The problem is with the memory complexity of the UTF8.GetString which allocates further memory based on it’s input. In this particular case the allocation pattern goes something like

  5MB  -- Byte Array allocation (as expected and works)

5MB -- Used by GetString call
10MB -- Used by GetString call
5MB -- Used by GetString call
10MB -- Used by GetString call


So GetString needed a whooping 30MB and the total allocation is around 35MB which was really not available.


Morale of the story: Count your bytes well, preferably through a tool like Remote Performance Monitor