Sunday, February 01, 2009

Back To Basics: Copying Garbage Collection

Banana leaf

This post is Part 5 in the series of posts on Garbage Collection (GC). Please see the index here.

In my previous post I have discussed reference counting and mark-sweep garbage collector in some detail. This post is a quick intro into the copying GC. I’ll not be going into the gory details because this is not much used in the .NET world.

The copying GC conceptually breaks the memory into two regions. Let’s call the ping and pong as shown below.


The buffer Ping is initially used for allocation of memory. It maintains a pointer at the end of the last allocated memory and for each new request simply returns that pointer and then increments it to point to the new end. So at the beginning when no memory is allocated it points to the bottom. After a bunch of allocation this is what the it looks like


Allocation is very fast as it involves only incrementing a pointer. However, after a bunch of allocations the situation will arrive where current-Pointer + requested > Ping-top. This means no more memory can be allocated from Ping. It is at this time the GC is run.

Garbage collection involves enumerating the roots and then copying every live object from Ping to Pong. Since objects are moved in memory all their references have to be updated as well. After the copy the buffers are switched so that subsequent allocations can now happen from Pong. Ping remains clean for the next copy cycle.

For the example set above lets assume when the GC is run Object 1 and 3 are live but 2 isn’t. After GC runs the memory layout becomes


Advantages and disadvantages

The primary advantage is that allocation is extremely fast (almost equal to stack allocation) because out of memory checks is just a pointer check and actual allocation is just incrementing the free pointer. Since the GC compacts memory it maintains good locality of reference resulting in better cache hit ratios and hence faster execution.

The main disadvantage with copying GC is that almost double the memory space is required. Even though it is argued that in virtual memory based systems it shouldn’t be a problem because the un-used memory pages will simply be paged out to the disk, the argument is not essentially true. Specially during GC phase both the memory half's are touched (due to ping <-> pong copy) and the number of page-faults generated are much larger than other GC schemes.

No comments: