Sunday, February 08, 2009

Back to Basics: Optimizing reference counting garbage collection

Mesmerized by magic

One of the comments on my previous post on reference counting GC prompted this post. The comment goes as

“Ref counting GC seems to have some hard issues. How does it still get used?”

The advantages of reference counting GC is that it allows reclamation instantly after the object is no longer in use and that it spreads the GC cost across execution and hence does not suffer from the longer pauses like mark-sweep GC does. These makes it compelling to get used in some scenarios (e.g. interactive systems). However, it does suffer from some major drawbacks. In this post I will skim through the issues and how they are worked around in real systems.

Cyclic references

The fact that cyclic references cannot be handled by general refcounting mechanism is touted as it’s biggest draw back. A variety of approaches has been attempted to solve this. The two main categories of approaches include either ways to distinguish pointers that form a part of the cycle from those that do not or using back up mark-sweep GC.

Using a backup mark-sweep GC works pretty well in handling cyclic references. The idea is to have a regular refcounting GC that just ignores cyclic references. It fails to cleanup cyclic references that just builds up over time. When periodically the back up mark-sweep GC is run which cleans up objects with non-zero refcount that are not externally reachable.

Reducing the number of reference counting

Reference counting makes pointer manipulation very costly because every referencing/de-referencing results in book keeping to happen. One common approach to work around this, is to figure out what are the reference counting that can be safely deferred to be handled later.

For a lot of systems most references are taken in function local variables which takes references as the function is called and releases them as the function returns. Even inside functions iterators can walk over data structures like lists. Refcounting these can be very costly. These reference needn’t be explicitly counted right away as they  on the stack and can be accessed later.

However, for global variables and even class member variables assignments the reference count is updated.When on de-referencing an object the refcount goes to 0 it cannot be reclaimed because there can be some stack variable references to it (which was not counted). To handle this the object is placed on a special list which contains all objects whose ref count has hit 0 (e.g. ZeroRefCountList).

With this mode the execution can proceed as usual and every so often a special routine is fired which walks all the active thread’s call stack to locate the stack variable and then update object ref count as follows

void UpdateRefCount()
// Stack object references are not refcounted and so bump up their
// refcount for all their occurrences

foreach (object obj in stack)

// At this point the ZeroRefCountList objects are updated to
// their latest ref count values. So 0 means really 0 refcount
// and hence free them recursively

foreach (object obj in ZeroRefCountList)
if(refCount(obj) == 0)
foreach(Object child in obj.GetChildren)

// reset the refcount to get it back to the state
// where it was before calling UpdateRefCount

foreach (object obj in stack)

The above routine is a condensed version of mark and sweep with the differences that it needn’t walk the entire memory and is not performance intensive.

Reduce space overhead

In the worst case every pointer in the system can point to a the same object and hence the refCount field is typically pointer sized so that it can store all the reference’s count. However, in reality this is seldom the case and being so conservative might result in a lot of space overhead per object. Say for a 32 bit system it means storing a 32 bit value for every object. For small sized objects that can mean upto 50% overhead.

To handle this a conservative estimate can be made to the maximum number of reference that an object can have. Let’s assume it to be 256. This needs a refcount field of 8 bits (1 byte) and that is stored in the object header. Post this the refcount is made as follows

void RefCountIncrement(Object obj)
if(obj.refCount < 256)

void RefCountDecrement(Object obj)
if(obj.refCount < 256)

The above algorithm allows correct refcounting for all objects whose refcount never goes upto 256. The moment it does it’s refcount sticks to 256 and is never incremented or decremented. Typically these objects are fairly smaller in number but if they exist they are not reclaimed (sticks to the system). Once such accumulations of these happen in significant number (which will take a lot of time) a backup tracing GC is run to sweep the memory of these objects.

Why use backup mark-sweep GC as backup?

A bunch of the solutions mentioned above use mark-sweep GC as a solution. The question is if they are to be used then why not completely switch to mark-sweep and dump refcounting. The main reason is that refcounting act an an optimization for mark-sweep GC reducing system pauses (remember they typically halt execution completely) introduced by them and making them run less often.

No comments: