Friday, October 28, 2011

Windows Phone Mango: Under the hood of Fast Application Switch

spin

Fast Application Switch of FAS is kind of tricky for application developers to handle. There are a ton of documentation around how the developers need to handle the various FAS related events. I really liked the video http://channel9.msdn.com/Events/DevDays/DevDays-2011-Netherlands/Devdays059 which walks through the entire FAS experience (jump to around 8:30).

In this post I want to talk about how the CLR (Common Language Runtime or .NET runtime) handles FAS and what that means to your application. Especially the Active –> Dormant –> Active flow. Most of the documentation/presentation quickly skips over this with the vague “The application is made dormant”. This is equivalent to the “witches use brooms to fly”. What is the navigation mechanism or how the broom is propelled are the more important questions which no one seems to answer (given the time of year, I just couldn’t resist :P) . Do note that most developers can just follow the coding guidelines for FAS and never need to care about this. However, a few developers, especially the ones developing multi-threaded apps and using threading primitives may need to care about this. And hence this post

Design Principle

The entire Multi-threading design was made to ensure the following

Principle 1: Pre-existing WP7 apps shouldn’t break on Mango.
Principle 2: When an application is sent to the background it shouldn’t consume any resources
Principle 3: Application should be resumed fast (hence the name FAS)

As you’d see that these played a vital role in the design being discussed below.

States

The states an application goes through is documented in http://msdn.microsoft.com/en-us/library/ff817008(VS.92).aspx 

Execution Model Diagram for Windows Phone 7.5

CLR Design

The diagram below captures the various phase that are used to rundown the application to make it dormant and later re-activated.  It gives the flow of an application as it goes through the Active –> Dormant –> Active state (e.g. the application was running and the user launches another application and then uses the back button to go back to the first application).

image

Deactivated

The Deactivated event is sent to the application to notify it that the user is navigating away from the application. After this there is 3 possible outcomes. It will either remain dormant, gets tombstoned or finally gets killed as well. Since there is no way to know which would happen, the application should store its transient state into the PhoneApplicationPage.State and it’s persistent state into some persistent store like the IsolatedStorage or even in the cloud. However, do note that the application has 10 seconds to handle the Deactivated event. In the 3 possible situations this is how the stored data will be used back

  1. Deactivate –> Dormant –> Active
    In this situation the entire process was intact in memory and just it’s execution was stopped (more about it below). In the Activated event the application can just check the IsApplicationInstancePreserved property. If this is true then the application is coming back from Dormant state and can just use the in-memory state. Nothing needs to be re-read in.
  2. Deactivate –> Dormant –> Tombstoned –> Active
    In this case the application’s in-memory state is gone. However, the PhoneApplicationPage.State is serialized back. So the application should read persistent user data from IsolatedStorage or other permanent sources in the Activated event. At the same time it case use the PhoneApplicationPage.State in the OnNavigatedTo event.
  3. Deactivate –> Dormant –> Terminated
    This case is no different from the application being re-launched. So in the Launching event the user data needs to be re-created from the permanent store. PhoneApplicationPage.State  is empty in this case

The above supports the #1 principle of not breaking pre-existing WP7 apps. A WP7 app would’ve been designed without considering the dormant stage. Hence it would’ve just skipped the #1 option. So the only issue will be that a WP7 app will result in re-creating the application state each time and not get the benefit of the Dormant stage (it will get the performance of Tombstoning but not break in Mango).

Post this event the main thread never transitions to user code (e.g. no events are triggered). The requirement on the application for deactivate is that

  1. It shouldn’t run any user code post this point. This means it should voluntarily stop background threads, cancel timers it started and so on
  2. This event needs to be handled in 10 seconds

If app continues to run code, e.g. in another thread and modifies any application state then that state cannot be persisted (as there will be no subsequent Deactivated type event)

Paused

This event is an internal event that is not visible to the application. If the application adhered to the above guideline it shouldn’t care about it anyway.

The CLR does some interesting stuff on this event. Adhering to the “no resource consumption” principle is very important. Consider that the application had used ManualResetEvent.WaitOne(timeout). Now this timeout can expire in the time when the application was dormant. If that happened it would result in some code running when the application is dormant. This is not acceptable because the phone maybe behind locked screen and this context switch can get the phone out of a low-power state. To handle this the runtime detaches Waits, Thread.Sleep at Paused. Also it cancels all Timers so that no Timer callbacks happen post this Pause event.

Since Pause event is not visible to the application, it should consider that some time post Deactivated this detach will happen. This is completely transparent to user code. As far as the user code is considered, it just that these handles do not timeout or sleeps do not return during the time the application is dormant. The same WaitHandle objects or Thread.Sleeps start working as is after the application is activated (more about timeout adjustment below).

This is also the place where other parts of the tear-down happens. E.g. things like asynchronous network calls cancelled, media is stopped.

Note that the background user threads can continue to execute. Obviously that is a problem because the user code is supposed to voluntarily stop them at Deactivated.

Freeze

Besides user code there are a lot of other managed code running in the system. These include but not limited to Silverlight managed code, XNA managed code. Sometime after Paused all managed code is required to stop. This is called the CLRFreeze. At this point the CLR freezes or blocks all managed execution including user background threads. To do that it uses the same mechanism as used for foreground GC. In a later post I’d cover the different mechanics NETCF and desktop CLR uses to stop managed execution.

Around freeze the application enters the Dormant stage where it’s in 0 CPU utilization mode.

Thaw

Managed threads stopped at Freeze are re-started at this point.

Resuming

At Resuming the WaitHandle, Thread.Sleep detached in Paused is re-attached. Also timeout adjustments are made during this time. Consider that the user had two handles on which the user code started Waits with 5 seconds and 10 seconds timeouts. After 3 seconds of starting the Waits the application is made dormant. When the application is re-activated, the Waits are restarted with the amount of timeout remaining at the point of the application getting deactivated. So essentially in the case below the first Wait is restarted with 2 seconds and the later with 7. This ensures that relative gap between Sleeps, Waits are maintained.

image

Note timers are still not restarted.

Activated

This is the event that the application gets and it is required to re-build it’s state when the activation is from Tombstone or just re-use the state in memory when the activation is from Dormant stage.

Resumed

This is the final stage or FAS. This is where the CLR restarts the Timers. The idea behind the late start of timers is that they are essentially asynchronous callbacks. So the callbacks are not sent until the application is activated (built its state) and ready to consume those callbacks.

Conclusion

  1. Ideally application developer needs to take care of the FAS by properly supporting the various events like Deactivated, Activated
  2. Background threads continue to run post the Deactivated event. This might lead to issues by corrupting application state and losing state changes. Handle this by terminating the background threads at Deactivated
  3. While making application dormant Waits, Sleeps and Timers are deactivated. They are later activated with timeout adjustments. This happens transparently to user code
  4. Not all waiting primitives are time adjusted. E.g. Thread.Join(timeout) is not adjusted.

Tuesday, August 02, 2011

The C Word

pic

I have really thought a lot over posting this. This is a very personal post. I could either keep it to myself or share this to let others going through the same know its OK to be scared. If you are here to read about programming and other fun stuff, go save yourself some time and skip this post.

The three week of wait

My wife was having a low-grade fever for 3 weeks. Unfortunately it coincided with a number of small symptoms. This made the spine specialist diagnose a slightly herniated disc, the internal medicine doctor diagnosed some sort of infection and the allergist a pain killer (Diclofenac) related allergy. Later when nothing seemed to clear it up, we were told its most likely an auto-immune disease. Test after test revealed nothing other than an elevated ESR and CPT. Apparently there was an inflammation going on inside her body. This is when we decided to cancel her home trip to India as the 24 hour travel was simply not possible with the back pain associated with the herniated disc.

All this while the doctor’s kept going by the book. I feel this is something to do with the fact that doctors in the US are forced to be in a state of perfect equilibrium due to the opposing forces of lawsuits from patients (doctor didn’t do enough) and push-back from insurance companies (doctor is doing too much). We were told time and again since her fever is not over 101F and it’s not been there for more than 3 weeks, there is no further testing required. Read up Mayo-clinic or Wikipedia on Fever of Unknown Origin and you’d see this as the criteria for getting concerned. However, my wife all the while could easily tell that something else was going on and she is feeling different. In other countries the doctors would have run atleast a chest X-ray by this time. IMO the risk of additional radiation was easily worth it given her condition.

By this time we had been to five specialists and were waiting for the sixth, the rheumatologist to make the final diagnosis of some auto-immune disease. The spine specialist grudgingly decided to do an additional MRI (the second) of her entire pelvis. We went for that second MRI on Saturday 7/23/2011. Our primary care physician however, was worried. He kept calling us to enquire how she was feeling. While on the way back from the MRI, he called us up again and asked us to come visit him on Monday, for further evaluation and maybe some full body scan. We could feel the edge in his voice. May he be blessed for the care he showed.

We were learning that the hardest part of being patients was really to have the patience.

The end of wait

We went home and I asked my wife to try some Yoga breathing exercises called Pranayama. She tried that and went to sleep. At this point in time we crossed the 3 weeks threshold and thankfully her fever crossed the 101F mark. Next day was a Sunday. She woke up feeling worse and had a weird feeling in her chest and a huge rash on her leg. Really worried we rushed to the Bellevue Overlake ER room.

Within minutes her vitals were checked and she was put on a bed. The doctor re-diagnosed some auto-immune disease and setup an appointment for the next day. Then he did something which should’ve been done weeks earlier, ordered a chest X-ray. He left the room and went to casually check the x-ray, then came rushing back.

He asked my wife when she had the last X-ray. It was in 2007. The doctor’s face told us something is terribly wrong. He said there is a tumor in-between her heart and lungs. Clinically it looks like a Lymphoma. My wife responded by a Vasovagal response. Which essentially means she had a fit, fainted and her pressure hit 45/60 and pulse was at 40. Somehow for me everything slowed down, as if life was on slow-motion. I could process everything going around, I could talk to the doctor and figure out the next steps. No idea how I pulled that off. Our life changed at that very point. The C word entered our lives. I thanked myself for asking her to do the breathing exercise, which exercised her lungs enough to surface the discomfort that sent us to the ER. Her left lung was only half opening due to the tennis ball sized tumor.

She had a CAT scan within 10 minutes. Now we had the 3D image of the tumor, it looked like a Lymphoma to the doctor. I sat with him for 15 minutes seeing the slides myself. I asked all the questions I could think off. Biopsies were ordered and we headed home, in stunned silence. She was only 34 and was perfectly healthy 3 weeks back, travelling across Washington to Palouse.

The kind words provided by the ER doctor provided too little to console my wife. However, I vaguely remember coming home, ordering a bunch of camera equipment for portrait photography and the Kindle version of the book The Emperor of All Maladies: A Biography of Cancer. The book was recommended by the ER doctor while talking about how much cancer medicine has progressed. This is a Pulitzer winning work by another bong Siddhartha Mukherjee who seemed to have poor skills in choosing and hence studied in all of Stanford, Oxford, Columbia and Harvard.

I really have no idea why I bought those stuff and didn’t even remember do so until I got some emails informing me that I have indeed ordered them.

Beginning of waiting again

The very next day Monday 7/25/2011 the Oncologist called us, and he told that he has setup the first set of fine needle aspiration Biopsy (FNAC). As my wife was in the Radiology room getting her Biopsy done, I got the phone call from the spine specialist. Remember he ordered a pelvic MRI on Friday. He started by saying he sees some abnormality on the sacral bone. I told him about the Lymphoma and he said, they should be related and asked for the contact info of our Oncologist.

So now Cancer has spread as well. For the first time in my life I publicly broke down sobbing un-controllably in the Overlake Hospital Café. I then rushed up to meet our oncologist for the first time and handed over the MRI discs to him.

Two days later on Wednesday 7/27/2011 we got to know that not enough tissue was available from the biopsy. They can see the abnormal cancer tissue but for further sub-typing they need to have more samples. So Biopsy of the sacral bone was ordered as well. Thursday we got that done. The doctor called us many times letting us know of the progress and that the best lab in NW in Seattle is going through the slides. Finally we got the full diagnosis of Nodular Sclerosing Hodgkin’s Lymphoma and it’s Stage IV.

Then on advice from friends we looked up Seattle Cancer Care Alliance and we are going there for treatment. Right now they are relooking at all the cytology slides to re-confirm the disease and my wife is going through a bank of tests to ensure she is ready for the chemo that’s going to start in the next week.

My wife want’s to know why it took the most advanced medical system in the world, 4 weeks for the diagnosis, especially when the test required was just a chest X-ray. All the while five doctors chose to just wait around when she could exactly point with her finger the point on her spine where the second cancer tumor was. Why was it not caught in her last yearly exam? The reason was simple even the basic tests are not done in the US in yearly exams. I am sure it’s something to do with the costs. In India a yearly exam is extremely thorough. I am sure they go overboard and expose people to X-ray and such but in case this was still going on, it’d surely have been caught.

However, once this was diagnosed things started moving extremely swiftly. Doctors kept calling other doctors and pathologists. Multiple people were examining and cross examining the CT video and cytology slides. Doctors were calling us back to let us know of the progress and telling us to be brave.

How are we coping

I have known my wife for the last 50% of my life. She was my high-school friend, and we fell in love just before moving onto college. Marrying high-school sweet heart has it pros and cons. She knows all my crushes and flings which I foolishly confided in a friend who turned into a wife later :). She has been there hand-holding and steering me through all the while. She stood beside me and continually, supported me as I rose from an average student in an unknown college to get into the top 10 in the university. I completed my BS-Physics and then chose Computer Science, she was beside me in the queue to submit the application. She was hanging out in the hotel lobby when I interviewed with Microsoft 7 years back in Delhi. She was always there.

After getting my first job in Texas Instruments I just couldn’t cope with the distance. We got married in 3 months and she joined me in Bangalore after 3 more months. We had by that time known each other for 6 years. Our 10th marriage anniversary is just 6 months away. We are going to hit 16 years of togetherness. In this time we have moved homes across 3 cities in India, moved across the oceans and settled in US. Had our daughter who is 6 year old. In this time she had lost her mother to cancer and I lost my father to COPD and my thyroid due to suspected cancer tumors. So we have/had a roller coaster ride, a ride we thoroughly enjoyed.

Role Reversal

Now the roles have reversed. It’s now my turn to fight the battle and steer her through. Unfortunately the human body is not just a computer for which I could write a garbage collector to go cleanup all the garbage-cells. I am used to have all the nodes of the decision tree fully expanded. Mistakes usually involves retracing ones step and re-routing. Both of these don’t work in this situation.

However, I plan to be agile, do my research and work with the doctors to provide the best care my wife can get. I will surely learn a lot of lessons of life on the way

Time to count the blessings

While I can crib about how life is unfair and how this shouldn’t be happening to me. But that doesn’t really add value. When I look around I can actually see a lot of blessings that’s been showered on us. I also learnt that this is the time to move pride out of the way and ask for help, there are so many around us who just need the cue to do something

  1. We have had a great live together. Frankly there is no good time to get a disease like Cancer. It’s not as if at 60 we’d be ready. One is never ready and this timing is as bad as any other.
  2. Our daughter holds our sanity together. Whenever we get all bogged down she comes up with the weirdest question which becomes thrice as funny when asked by someone missing three of her front tooth.
  3. We literally haven’t cooked food for the last 4 weeks. There’s always been one neighbor or friend coming to our house with something to eat. People whom we know for barely weeks have been there for us. So it’s idly-sambhar for breakfast and paratha with alloo-gobi in dinner.
  4. We have never had to get a sitter or drop our daughter to the day-care as we rush from one hospital to another, she has always been to one of her friends on a play-date, little aware of the turmoil we are going though.
  5. We just had to ask for a change of apartment (one with elevator access). The management team of the community we live in, attacked this issue with full force. They’ve made so many exceptions to make things easy for us, that I cannot thank them more
  6. Thankfully we chose to come to US. One of the world’s best cancer research center is just 30 minutes from where I live
  7. I have one of the best employers in the world. Microsoft’s medical insurance have been phenomenal till now. I feel sad that it’s not going to be that great from 2013 (change in policy)
  8. My manager has been super supportive. I’ve given many weekends and nights to work, to pull off deadlines. Now it seems so worthwhile.

What am I asking for

Nothing really. If you are religious, just pray to whomever you believe in to give us strength to pull us through. This post was just so that I can get this whole thing out of my system and concentrate on solving the issue at hand. If you are a friend or family just bear with me if I am suddenly withdrawn of plain irritating.

Monday, July 04, 2011

Palouse Photography Trip

From Wikipedia, “The Palouse is a region of the northwestern United States, encompassing parts of southeastern Washington, north central Idaho and, in some definitions, extending south into northeast Oregon. It is a major agricultural area, primarily producing wheat and legumes”. But that doesn’t really tell you what Palouse is :). My daughter named it more aptly as Greenland and the photo below is explanatory.

 

It’s a beautiful land of rolling green (or Yellow based on season) hills and a photographer’s delight. It’s one of the most beautiful landscape I have ever seen. It’s easily worth the 5-6 hour drive from Western Washington cities like Seattle. The receptionist at the hotel we stayed in said that they’ve had photographers from as far as Australia visit them.

Last week (6/16/2011) I went on a 2 day trip to the Palouse area in Eastern Washington. The primary plan was to just drive around and take some great shots. This post captures the quick details about what went right, what went wrong and some tips about how to make the trip family friendly.

Good Time to Visit

I asked around and checked Flickr for Palouse photos from different times of the year. I think early to mid June is pretty good time to see the excellent green colors in the field. August is great for capturing the golden yellow pre-harvest colors.

Where to Go

My agenda was to cover Palouse water falls, the Colfax-Steptoe-Garfield-Palouse-Pullman pentagon. I took the following route. Click around to see the main routes travelled.

image

On the Way

Palouse waterfallOn the way (and back) you should stop at the great I90 rest areas. They are spread around every 30-40miles and there are blue signs on the way indicating where they are (e.g. Rest Area Next Right) and how far the next one is (Next Rest Area 34 miles). Many of these serve free free hot coffee and cookies and all of them have rest rooms. After gorging on the super hot coffee and home made cookies made by dear old ladies who run these places, do remember to generously donate in the donation boxes. We carried our camping stove and had some nice warm noddle soup for lunch at the rest area.

Palouse waterfall, furry friendWe then stopped at the Palouse falls (B in the map). As with most remote sites with dirt roads, the GPS picked up some really bad roads for the routing (Nunamaker Rd). Avoid this road and follow the signs and the route showed in the map which is to avoid Nunamaker Rd and continue straight on 260 and take a left onto 261. This goes into the Palouse Falls State Park. The Park has around 10 campsites and has a huge waterfall plummeting down 190ft. Covered picnic shelter with tables, water and barbeque pits are available here. The camp sites are first come only and there is a resident host as well. There is water but no power hookup at the sites. There are some short hiking opportunities as well where you can hike 0.5 miles up towards the waterfall top. However, do note that the hiking routes have some points where there is significant exposure to heights and you gotta be very careful. There is also no safe way to descend to the bottom of the falls even though there seems to be some trails that go down.

There are a lot of cute Marmots and a lot of huge insects and snakes all around.

The Palouse

Palouse Old barnWe didn’t stay in the Falls area. Instead we went ahead to Colfax which is around an hour away. I really liked the Best Western Wheatland Inn at Colfax where we stayed. We paid around $110 for the night. It had a nice hot breakfast, good rooms with TV and a nice warm indoor swimming pool. I cared about these because my wife and kid would spend a lot of time indoors as I’d drive around on the dirt roads for the next day or so. Just opposite to the hotel was a store, Taco Bell, Subway and other eating joints. I have had other friends who stayed in Pullman (30mins south of Colfax) which has many more hotel options because it’s the house to Washington State University. However, I ‘d recommend staying in Colfax due to it’s proximity to Steptoe butte and other interesting areas.

Palouse from Steptoe ButteAfter checking in I went to the Steptoe Butte for the customary Sunset shot, which however was utterly ruined due to rains and clouds. Steptoe Butte is a very important geological feature and a must visit. So much so that other geological features of it’s type is called steptoes. Steptoe is a 400 million year old rock sticking up high over the rolling hills of Palouse. Amazing views of light and shadow on the Palouse and patterns of wheat harvest can be seen from top.

Post sunset I got back to the hotel. Next day morning I mainly drove around inside the Pentagon. Then came back for some lunch in the hotel. Then again after some driving around headed back via I90.

Palouse run down barnTo drive around the key thing to remember is that all state routes have almost no shoulders and no place to stop. So even if one route offers a good view, you cannot stop to photograph. Also the broken down barns are not really on the main routes. So you need to pick up the dirt roads and drive on them. These are gravel roads and cars skid a lot on them. So a 4x4 car is a good thing to have but not really mandatory. The route I mainly drove on is in the map linked above. However, they show the main route taken, you’d need to continually divert from them to get to the side roads as well.

  1. Palouse colors

Some nice roads to try

  1. Palouse Alboin Road. Get into the side roads for broken garages and barns
  2. I tried and really liked the Chief Kamiakin Park >Fugate Road > S Palouse River Rd. However,
  3. Hume Road (From SR 195 to Oaksdale)

The photographs at http://www.flickr.com/photos/abhinaba/sets/72157626870022427 are manually geocoded, so that should give an approximate idea on where they were taken.

The Way Back

We came back via a different route because we were already close to Oakesdale and were told that the road from there to I90 had some nice views. We took the Oakesdale > Thornton > St. John > Sprague > Seattle via I90 West route.

Palouse farming planesWe did make some stops on the way back. One was the scenic viewpoint just before crossing the Columbia river gorge. The other was the Ginkgo Petrified Forest State Park Visitor center. This is just after the gorge. It contains some fantastic samples of petrified tree trunks where Ginkgo trees have been transformed to rocks (silica) after being covered by lava flows some 15 million years back. Also from the interpretive center you can see the banks of the Columbia river. We had some fun counting the layers of Lava visible on the banks where each layer corresponds to a major lava-flow incident millions of years back.

Back Home

The trip was awesome and covered 400 million year old rocks, 15 million year old tree trucks, 3 month old crops and everything in between. I plan to go back again to capture the Golden hues of the pre-harvest crops soon.

Friday, June 24, 2011

Secure Conversion of your personal Microsoft Office Documents to Kindle

Farming plane in Palouse, WA

I love my Kindle 3 and really enjoy it’s readability and portability. However, a lot of my reading is confidential docs and papers which don’t come off the Kindle store. I had to spend some time to figure out the best way to get those onto Kindle securely. Since I use the latest Microsoft document formats (docx) a bunch of the commonly referred tools like Calibre didn’t work for me.

In case you are not worried about security and do not mind sending your document to Amazon for free conversion and transfer to your kindle, then just visit http://www.amazon.com/gp/help/customer/display.html?nodeId=200375630 and thanks for stopping by on this blog Smile

However, in case the document is confidential and you don’t want to send it to Amazon read on

Using PDF

Kindle 3 supports PDF natively. So you can just transfer a PDF to it by connecting the Kindle to your PC over USB. Save the document to pdf using say Microsoft office save to PDF (File –> Save As (format PDF)) and then attach your kindle to the PC, it will show up as a Removable Storage drive and you can copy the PDF to it.

However, the document might turn out with tiny text and you’d need to scroll all around on the Kindle to view it. This is far from being acceptable and I use the following steps to get around that.

Install CurePDF writer from http://cutepdf.com/Products/CutePDF/writer.asp. This installs a virtual printer to your PC called “CutePDF Writer”. Now print any document to this printer (File –> Print) and it will create a pdf file on your computer. While printing I use the following settings

image

I use a small page size of A6 to ensure good readability on the Kindle. Once you click on the Print button CutePDF shows a file save as dialog using which you can directly point it to the attached Kindle storage drive. Experiment for what page size works best for the particular kind of document and Kindle you have (plain vs Dx).

Using MobiPocket Creator

Download MobiPocket Creator from http://www.mobipocket.com/en/DownloadSoft/ProductDetailsCreator.asp.  Launch MobiPocket and then click on Import –> MS Word Document

image

Use the browse button to choose the document, in the file open dialog choose the document. However, do note that you may need to change the File type filter to *.* and not *.doc if you are opening the latest docx format

image

Once you have selected the document click on Import. This will get the document imported to MobiPocket.

image

Click on Build and then “Open folder containing eBook” and click OK. From the folder that opens copy the *.opf and *.prc to the Kindle’s document folder. Detach the kindle from the PC. The document should show up on the Kindle home screen.

Tuesday, June 14, 2011

WP7 Mango: The new Generational GC

In my previous post “Mark-Sweep collection and how does a Generational GC help” I discussed how a generational Garbage Collector (GC) works and how it helps in reducing collection latencies which show up as long load times (startup as well as other load situations like game level load) and gameplay or animation jitter/glitches. In this post I want to discuss how those general principles apply to the WP7 Generational GC (GenGC) specifically.

Generations and Collection Types

We use 2 generations on the WP7 referred to as Gen0 and Gen1. A collection could be any of the following 4 types

  1. An ephemeral or Gen0 collection that runs frequently and only collects Gen0 objects. Object surviving the Gen0 collection is promoted to Gen1
  2. Full mark-sweep collection that collects all managed objects (both Gen1 and Gen0)
  3. Full mark-sweep-compact collection that collects all managed objects (both Gen1 and Gen0)
  4. Full-GC with code-pitch. This is run under severe low memory and can even throw away JITed code (something that desktop CLR doesn’t support)

The list above is in the order of increasing latency (or time they take to run)

Collection triggers

GC triggers are the same and as outlined in my previous post WP7: When does the GC run. The distinction between #2 and #3 above is that at the end of all full-GC the collector considers the memory fragmentation and can potentially run the memory compactor as well.

  1. After significant allocation
    After significant amount of managed allocation the GC is started. The amount today is 1MB (called GC quanta) but is open to change. This GC can be ephemeral or full-GC. In general it’s an ephemeral collection. However, it might be a full collection under the following cases
    1. After significant promotion of objects from Gen0 to Gen1 the collections become full collections. Today 5MB of promotion triggers a full GC (again this number is subject to change).
    2. Application’s total memory usage is close to the maximum memory cap that apps have (very little free memory left). This indicates that the application will get terminated if the memory utilization is not cut-back.
    3. Piling up of native resources. We use different heuristics like native to managed memory ratio and finalizer queue heuristics to detect if GC needs to turn to full collection to release native resources being held-up due to Gen0 only collections
  2. Resource allocation failure
    All resource allocation failure means that the system is under memory pressure and hence such collections are always full collection. This can lead to code pitch as well
  3. User code triggered GC
    User code can start collections via the System.GC.Collect() managed API. This results in a full collection as documented by that API. We have not added the method overload System.GC.Collect(generation). Hence there is no way for the developer to start a ephemeral or Gen0 only collection
  4. Sharing server initiated
    Sharing server can detect phone wide memory issue and start GC in all managed processes running. These are full-GC and can potentially pitch code as well.

 

So from all of the above, the 3 key takeaways are

  1. Low memory or memory cap related collections are always full-collections. These could also turn out to be the more costly compacting collection and/or pitch JITed code
  2. Collections are in general ephemeral and become full-collection after significant object promotion
  3. No fundamental changes to the GC trigger policies. So an app written for WP7 will not see any major changes to the number of GC’s that happen. Some GC will be ephemeral and others will be full-GCs.

 

Write Barriers/Card-table

As explained in my previous post, to keep track of Gen1 to Gen0 reference we use write-barrier/card-table.

Card-table can be visualized as a memory bitmap. Each bit in the card-table covers n bytes of the net address space. Each such bit is called a Card. For managed reference updates like  A.b = C in addition to JITing the real assignment, calls are added to Write-barrier functions. This  write barrier locates the Card corresponding to the address of write and sets it. Later during collection the collector checks all Gen-1 objects covered by a set card-bit and marks Gen-0 references in those objects.

This essentially brings in two additional cost to the system.

  1. Memory cost of adding those calls to the WB in the JITed code
  2. Cost of executing the write barrier while modifying reference

Both of the above are optimized to ensure they have minimum execution impact. We only JIT calls to WB when absolutely required and even then we have an overhead of a single instruction to make the call. The WB are hand-tuned assembly code to ensure they take minimum cycles. In effect the net hit on process memory due to write barriers is way less than 0.1%. The execution hit in real-world applications scenarios is also not in general measureable (other than real targeted testing).

Differences from desktop

In principle both the desktop GC and the WP7 GC are similar in that they use mark-sweep generational GC. However, there are differences based on the fact that the WP7 GC targets a more constrained device.

  1. 2 generations as opposed to 3 on the desktop
  2. No background or incremental collection supported on the phone
  3. WP7 GC has additional logic to track and handle application policies like application memory caps and total memory utilization
  4. The phone CLR uses a very different memory layout which is pooled and not linear. So no concept of Large Object Heap. So lifetime of large objects is no different
  5. No support for particular generation collection from user code

Thursday, June 09, 2011

WP7 Mango: Mark-Sweep collection and how does a Generational GC help

About a month back we announced that in the next release of Windows Phone 7 (codenamed Mango) we will ship a new garbage collector in the CLR. This garbage collector (GC) is a generational garbage collector.

This post is a “back to basics” post where I’ll try to examine how a mark-sweep GC works and how adding generational collection helps in boosting it’s performance. We will take a simplified look at how mark-sweep-compact GC works and how generational GC can enhance it’s performance. In later posts I’ll try to elaborate on the specifics of the WP7 generational GC and how to ensure you get the best performance out of it.

Object Reference Graph

Objects in the memory can be considered to be a graph. Where every object is a node and the references from one object to another are edges. Something like

image

To use an object the code should be able to “reach” it via some reference. These are called reachable objects (in blue). Objects like a method’s local variable, method parameters, global variables, objects held onto by the runtime (e.g. GCHandles), etc. are directly reachable. They are the starting points of reference chains and are called the roots (in black).

Other objects are reachable if there are some references to them from roots or from other objects that can be reached from the roots. So Object4 is reachable due to the Object2->Object4 reference. Object5 is reachable because of Object1->Object3->Object5 reference chain. All reachable objects are valid objects and needs to be retained in the system.

On the other hand Object6 is not reachable and is hence garbage, something that the GC should remove from the system.

Mark-Sweep-Compact GC

A garbage collector can locate garbage like Object6 in various ways. Some common ways are reference-counting, copying-collection and Mark-Sweep. In this section lets take a more pictorial view of how mark-sweep works.

Consider the following object graph

1

At first the GC pauses the entire application so that the object graph is not being mutated (as in no new objects or references are being created). Then it goes into the mark phase. In mark phase the GC traverses the graph starting at the roots and following the references from object to object. Each time it reaches an object through a reference it flips a bit in the object header indicating that this object is marked or in other words reachable (and hence not garbage). At the end everything looks as follows

2

So the 2 roots and the objects A, C, D are reachable.

Next it goes into the sweep phase. In this phase it starts from the very first object and examines the header. If the header’s mark bit is set it means that it’s a reachable object and the sweep resets that bit. If the header’s bit is not set, it’s not reachable and is flagged as garbage.

3

So B and E gets flagged as garbage. Hence these areas are free to be used for other objects or can be released back to the system

4

This is where the GC is done and it may resume the execution of the application. However, if there are too many of those holes (released objects) created in the system, then the memory gets fragmented. To reduce memory fragmentation. The GC may compact the memory by moving objects around. Do note that compaction doesn’t happen for every GC, it is run based on some fragmentation heuristics.

5

Both C and D is moved here to squeeze out the hole for B. At the same time all references to these objects in the system is also update to point to the correct location.

One important thing to note here is that unlike native objects, managed objects can move around in memory due to compaction and hence taking direct references to it (a.k.a memory pointers) is not possible. In case this is ever required, e.g. a managed buffer is passed to say the microphone driver native code to copy recorded audio into, the GC has to be notified that the corresponding managed object cannot be moved during compaction. If the GC runs a compaction and object moves during that microphone PCM data copy, then memory corruption will happen because the object being copied into would’ve moved. To stop that, GCHandle has to be created to that object with GCHandleType.Pinned to notify the GC that the corresponding object should never move.

On the WP7 the interfaces to these peripherals and sensors are wrapped by managed interfaces and hence the WP7 developer doesn’t really have to do these things, they are taken care offm under the hood by those managed interfaces.

The performance issue

As mentioned before during the entire GC the execution of the application is stopped. So as long as the GC is running the application is frozen. This isn’t a problem in general because the GC runs pretty fast and infrequently. So small latencies of the order of 10-20ms is not really noticeable.

However, with WP7 the capability of the device in terms of CPU and memory drastically increased. Games and large Silverlight applications started coming up which used close to 100mb of memory. As memory increases the number of references those many objects can have also increases exponentially. In the scheme explained above the GC has to traverse each and every object and their reference to mark them and later remove them via sweep. So the GC time also increases drastically and becomes a function of the net workingset of the application. This results in very large pauses in case of large XNA games and SL applications which finally manifests as long startup times (as GC runs during startup) or glitches during the game play/animation.

Generational Approach

If we take a look at a simplified allocation pattern of a typical game (actually other apps are also similar), it looks somewhat like below

image

The game has a large steady state memory which contains much of it’s steady state data (which are not released) and then per-frame it goes on allocating/de-allocating some more data, e.g. for physics, projectiles, frame-data. To collect this memory layout the traditional GC has to walk or traverse the entire 50+ mb of data to locate the garbage in it. However, most of the data it traverses will almost never be garbage and will remain in use.

This application behavior is used for the Generational GC premise

  1. Most objects die young
  2. If an object survives collection (that is doesn’t die young) it will survive for a very long time

Using these premise the generational GC tries to segregate the managed heap into older and younger generations objects. The younger generation called Gen-0 is collected in each GC (premise 1), this is called the Ephemeral or Gen0 Collection. The older generation is called Gen-1. The GC rarely collects the Gen-1 as the probability of finding garbage in it is low (premise 2).

image

So essentially the GC becomes free of the burden of the net working set of the application.

Most GC will be ephemeral GC and it will only traverse the recently allocated objects, hence the GC latency remains very low. Post collection the surviving objects are promoted to the higher generation. Once a lot of objects are promoted, the higher generation starts becoming full and then a full collection is run (which collects both gen-1 and gen-0). However, due to premise 1, the ephemeral collection finds a lot of garbage in their runs and hence promotes very few objects. This means the growth rate of the higher generation is low and hence full-collection will run very infrequently.

Ephemeral/Gen-0 collection

Even in ephemeral collection the GC needs to deterministically find all objects in Gen-0 which are not reachable. This means the following objects needs to survive a Gen-0 collection

  1. Objects directly reachable from roots
  2. Root –> Gen0 –> Gen-0 objects (indirectly reachable from roots)
  3. Objects referenced from Gen1 to Gen0

Now #1 and #2 pose no issues as in the Ephemeral GC, we will anyway scan all roots and Gen-0 objects. However to find objects from Gen1 which are referencing objects in Gen-0, we would have to traverse and look into all Gen1 objects. This will break the very purpose of having segregating the memory into generation. To handle this write-barrier+card-table technique is used.

The runtime maintains a special table called card-table. Each time any references are taken in the managed code e.g. a.b = c; the code JITed for this assignment also updates an internal data-structure called CardTable to capture that write. Later during the ephemeral collection, the GC looks into that table to find all the ‘a’ which took new references. If that ‘a’ is a gen-1 object and ‘c’ a gen-0 object then it marks ‘c’ recursively (which means all objects reachable from ‘c’ is also marked). This technique ensures that without examining all the gen-1 objects we can still find all live objects in Gen-0. However, the cost paid is

  1. Updating object reference is a little bit more costly now
  2. Making old object taking references to new objects increases GC latency (more about these in later posts)

Putting it all together, the traditional GC would traverse all references shown in the diagram below. However, an ephemeral GC can work without traversing the huge number of Red references.

image

It scans all the Roots to Gen-0 references (green) directly. It traverses all the Gen1->Gen0 references (orange) via the CardTable data structure.

Conclusion

  1. Generational GC reduces the GC latency by avoiding looking up all objects in the system
  2. Most collections are gen-0 or ephemeral collection and are of low latency this ensures fast startup and low latency in game play
  3. However, based on how many objects are being promoted full GC’s are run sometimes. When they do, they exhibit the same latency as a full GC on the previous WP7 GC

Given the above most applications will see startup and in-execution perf boost without any modification. E.g today if an application allocates 5 MB of data during startup and GC runs after every MB of allocation then it traverses 15mb (1 + 2 + 3 + 4 + 5). However, with GenGC it might get away with traversing as low as only 5mb.

In addition, especially game developers can optimize their in-gameplay allocations such that during the entire game play there is no full collection and hence only low-latency ephemeral collections happens.

How well the generational scheme works depend on a lot of parameters and has some nuances. In the subsequent posts I will dive into the details of our implementation and some of the design choices we made and what the developers needs to do to get the most value out of it.

Wednesday, January 05, 2011

WP7: When does GC Consider a Local Variable as Garbage

 

Consider the following code that I received

static void Foo()
{
TestClass t = new TestClass();
List<object>l = new List<object>();
l.Add(t); // Last use of l and t

WeakReference w = new WeakReference(t);

GC.Collect();
GC.WaitForPendingFinalizers();

Console.WriteLine("Is Alive {0}", w.IsAlive);
}

If this code is is run on the desktop it prints Is Alive False. Very similar to the observation made by a developer in http://stackoverflow.com/questions/3161119/does-the-net-garbage-collector-perform-predictive-analysis-of-code. This happens because beyond the last usage of l and t, the desktop GC considers l and t to be garbage and collects it.


Even though this seems trivial, it isn’t. During JITing of this method the JITer performs code analysis and ensures that runtime tables are updated to reflect whether at a particular time a given local variable is alive or not (it is not used beyond that point). This needs a bunch of data structures to be maintained (lifetime tables) and also dynamic analysis of code.


In case of Windows Phone 7 (or other platforms using NETCF) for the same code you’d get Is Alive True. Which means the NETCF GC considers the object to be alive even beyond it’s last usage in the function.


Due to the limited resources available on devices, NETCF does not do these dynamic analysis. So we trade off some runtime performance to ensure faster JITing (and hence application faster startup). All local variables of all functions in the call stacks of the all managed threads in execution during garbage collection is considered live (hence also object roots) and not collect. In NETCF the moment the function Foo returns and the stack is unwound the local objects on it including t and l becomes garbage.


Do note that both of these implementations follow the ECMA specification (ECMA 334 Section 10.9) which states



For instance, if a local variable that is in scope is the only existing reference to an object, but that local variable is never referred to in any possible continuation of execution from the current execution point in the procedure, an implementation might (but is not required to) treat the object as no longer in use.”


Hence even though on the desktop CLR it is not important to set variables to null, it is sometimes required to do so on WP7. In case t and l was set to null GC would’ve collected them. So the guidance for WP7 is



  1. In most cases setting local variables to null is not required. Especially if the function returns fairly soon
  2. If you have locals that hold onto very large data structures and that function will remain executing for a long time then set those variables to null when you are done using them (e.g. in between the last usage of a variable in a large function and calls to a bunch of web-services in the same function which will take a long time to return).
  3. Use dispose patterns where required. It’s important on devices to free up limited resources as soon as possible.