Monday, January 26, 2009

Back To Basics: Reference Counting Garbage Collection

Blowing the candle

This is Part 3 in a series of post on GC, visit the list here.

Reference counting (refcounting) GC is one of the two primary GC mechanisms widely used. The basic workings of this GC is pretty simple and based on counting the number of reference to a memory block (or object) from other blocks. Each time memory is created or a reference to the object is assigned it’s refcount is bumped up. Each time a pointer is assigned away from it the refcount is reduced. When refcount of the block goes to 0, it means that no pointer references it and hence it is un-reachable and is garbage and can be reclaimed.

Let’s assume RefCount(VOID *pMem) gives us the reference count of the object at pMem. Then the following code shows how the reference count of a specific object in memory varies with the code flow.

Object * obj1 = new Object(); // RefCount(obj1) starts at 1
Object * obj2 = obj1; // RefCount(obj1) incremented to 2 as new reference is added
Object * obj3 = new Object();

obj2 = NULL; // RefCount(obj1) decremented to 1 as ref goes away
obj1 = obj3; // RefCount(obj1) decremented to 0 and can be collected

As the execution flow mutates the state of the program, the refcount of each object is updated transparently and the block is freed when refcount hits 0. As you might guess this record keeping and collection can be implemented in various ways. Here I discuss one of them…

Storage structure and allocation

For each object the refcount has to be stored in some place. Since this field will be accessed very often it’s best to keep it inside or close to the object to preserve spatial locality of reference which has a lot of impact on performance (by ensuring good cache hit ratio). One of the approach we can take (which is most common) is to add a small header to the start of every object allocated. So each time allocation is asked for a object we allocate sizeof(header) + sizeof(type to be allocate). The memory layout looks like


As apparent from the diagram the pointer returned is at the start of the actual data and not the start of the data allocated. This means that the caller isn’t even aware of the header and can use the data as is. However the garbage collector knows that every object has this header and can use any object pointer to access this header.

The header structure we choose is

struct MemHeader
UINT32 refCount;

The allocation routine (in pseudo C) looks like

// cb is the number of bytes to be allocated
PVOID GC_Alloc(size_t cb)
// allocate MemHeader + cb but cast it to MemHeader
MemHeader* pHdr = (MemHeader*)PlatformAlloc(MEMHEADERSIZE + cb);
if (pHdr == NULL)
return NULL;

// set the initial refCount
pHdr->refCount = 1;

// increment the pointer by the size of MemHeader
// and make it point to the start of the actual data

return (PVOID)pHdr;

When the GC wants to access the header it can simply use

inline MemHeader * GetHeader(PVOID pMem)
return ((MemHeader*)pMem) - 1;

Counting references

For every pointer updation the affected objects’ reference count has to be updated. E.g. for a pointer assignment of sort l = r the following needs to be done

void GC_UpdateRef(PVOID *l, PVOID r)
// used for assignment of l = r

// increment ref count for r-value as an additional reference is being taken
if (r != NULL)

// decrease ref count of l-value as it’s releasing it’s previous reference

// do the final assignment
*l = r;

However, this function has to be called for all pointer assignments and even when pointers go out of scope. This is the tricky part and is achieved by various ways

  1. Compilers can insert call to GC_UpdateRef for all pointer assignment and also when references go out of scope (like it inserts call to destructors/finalizers)
  2. For languages that support operator overriding this behavior can be added from outside. However, that might need some changes to the type system like making all garbage collectable types inherit from some common type (which then has the overriding added).

Freeing objects

The final piece is actually freeing objects when their reference count goes down to 0.

VOID GC_ReleaseRef(PVOID pMem)
if (pMem == NULL)

// get the memory header and decrement the refcount
MemHeader *pHdr = GetHeader(pMem);

// if ref count has gone down to zero, free it
if (pHdr->refCount == 0)

Even though this seems straightforward it is actually lacking a crucial block. Consider the following reference graph


Here if object 1 goes out of reference and gets reclaimed by GC_ReleaseRef then recursively we need to check (and free if required) all other references held by it. So in this particular case object 4 shouldn’t be freed (as object 2 has a ref to it) but both Object 3 and Object 5 needs to be freed. So the full method would be something like

VOID GC_ReleaseRef(PVOID pMem)
if (pMem == NULL) return;
MemHeader *pHdr = GetHeader(pMem);
if (pHdr->refCount == 0)
foreach(PVOID pChild in Get_Child(pHdr))


The additional part are the lines in red. The assumption is that Get_Child returns all other references held by the object and we recursively free them. However, actually implementing Get_Child is hard. One of the mechanism used it to maintain compiler/type-system generated reference mask which I will cover that as a part of mark-sweep garbage collection.

Advantage/disadvantage of reference counting GC

Reference counting GC has a primary disadvantage. Which is used on it’s own it completely fails to handle circular references. Consider the following reference graph


Here Object2 refers to Object3 and vice-versa. Now if Object 1 releases the reference marked in red then both Object2 and Object3 becomes un-reachable and garbage. However, since they refer to each other both will have a refCount of 1 and will not get freed. To handle this situation ref counting GC is never used on it’s own. It is used in conjunction with some other GC like mark-sweep which is run periodically to clean these circular reference leaks.

Refcounting GC adds significant code/perf bloat as for every assignment calls to update refCount has to be added. Moreover, for multi-threaded system it can prove to be a major issue as locks have to be taken when updating refCount. Even the most performant atomic operations (e.g. Win32 InterlockedIncrement) are costly when used repeatedly. The header space cost which seems petty also eats into the system because for small objects they cause significant overhead. E.g. for an allocated int32 it adds 100% overhead (as the header is also 32 bit value).

Advantage besides being easy to implement is that objects gets re-claimed the moment they become garbage. This means if objects support destructors/finalizers then these are run pretty soon and system resources held by them return to the OS faster.

Reference counting GC ensures that the collection is distributed over the whole period of execution and hence for interactive system doesn’t result in system freezes like say mark-sweep GC does.

No comments: