Tuesday, June 14, 2011

WP7 Mango: The new Generational GC

In my previous post “Mark-Sweep collection and how does a Generational GC help” I discussed how a generational Garbage Collector (GC) works and how it helps in reducing collection latencies which show up as long load times (startup as well as other load situations like game level load) and gameplay or animation jitter/glitches. In this post I want to discuss how those general principles apply to the WP7 Generational GC (GenGC) specifically.

Generations and Collection Types

We use 2 generations on the WP7 referred to as Gen0 and Gen1. A collection could be any of the following 4 types

  1. An ephemeral or Gen0 collection that runs frequently and only collects Gen0 objects. Object surviving the Gen0 collection is promoted to Gen1
  2. Full mark-sweep collection that collects all managed objects (both Gen1 and Gen0)
  3. Full mark-sweep-compact collection that collects all managed objects (both Gen1 and Gen0)
  4. Full-GC with code-pitch. This is run under severe low memory and can even throw away JITed code (something that desktop CLR doesn’t support)

The list above is in the order of increasing latency (or time they take to run)

Collection triggers

GC triggers are the same and as outlined in my previous post WP7: When does the GC run. The distinction between #2 and #3 above is that at the end of all full-GC the collector considers the memory fragmentation and can potentially run the memory compactor as well.

  1. After significant allocation
    After significant amount of managed allocation the GC is started. The amount today is 1MB (called GC quanta) but is open to change. This GC can be ephemeral or full-GC. In general it’s an ephemeral collection. However, it might be a full collection under the following cases
    1. After significant promotion of objects from Gen0 to Gen1 the collections become full collections. Today 5MB of promotion triggers a full GC (again this number is subject to change).
    2. Application’s total memory usage is close to the maximum memory cap that apps have (very little free memory left). This indicates that the application will get terminated if the memory utilization is not cut-back.
    3. Piling up of native resources. We use different heuristics like native to managed memory ratio and finalizer queue heuristics to detect if GC needs to turn to full collection to release native resources being held-up due to Gen0 only collections
  2. Resource allocation failure
    All resource allocation failure means that the system is under memory pressure and hence such collections are always full collection. This can lead to code pitch as well
  3. User code triggered GC
    User code can start collections via the System.GC.Collect() managed API. This results in a full collection as documented by that API. We have not added the method overload System.GC.Collect(generation). Hence there is no way for the developer to start a ephemeral or Gen0 only collection
  4. Sharing server initiated
    Sharing server can detect phone wide memory issue and start GC in all managed processes running. These are full-GC and can potentially pitch code as well.


So from all of the above, the 3 key takeaways are

  1. Low memory or memory cap related collections are always full-collections. These could also turn out to be the more costly compacting collection and/or pitch JITed code
  2. Collections are in general ephemeral and become full-collection after significant object promotion
  3. No fundamental changes to the GC trigger policies. So an app written for WP7 will not see any major changes to the number of GC’s that happen. Some GC will be ephemeral and others will be full-GCs.


Write Barriers/Card-table

As explained in my previous post, to keep track of Gen1 to Gen0 reference we use write-barrier/card-table.

Card-table can be visualized as a memory bitmap. Each bit in the card-table covers n bytes of the net address space. Each such bit is called a Card. For managed reference updates like  A.b = C in addition to JITing the real assignment, calls are added to Write-barrier functions. This  write barrier locates the Card corresponding to the address of write and sets it. Later during collection the collector checks all Gen-1 objects covered by a set card-bit and marks Gen-0 references in those objects.

This essentially brings in two additional cost to the system.

  1. Memory cost of adding those calls to the WB in the JITed code
  2. Cost of executing the write barrier while modifying reference

Both of the above are optimized to ensure they have minimum execution impact. We only JIT calls to WB when absolutely required and even then we have an overhead of a single instruction to make the call. The WB are hand-tuned assembly code to ensure they take minimum cycles. In effect the net hit on process memory due to write barriers is way less than 0.1%. The execution hit in real-world applications scenarios is also not in general measureable (other than real targeted testing).

Differences from desktop

In principle both the desktop GC and the WP7 GC are similar in that they use mark-sweep generational GC. However, there are differences based on the fact that the WP7 GC targets a more constrained device.

  1. 2 generations as opposed to 3 on the desktop
  2. No background or incremental collection supported on the phone
  3. WP7 GC has additional logic to track and handle application policies like application memory caps and total memory utilization
  4. The phone CLR uses a very different memory layout which is pooled and not linear. So no concept of Large Object Heap. So lifetime of large objects is no different
  5. No support for particular generation collection from user code

No comments: