The Itty-Bitty Garbage Collector

The Itty-Bitty Garbage Collector

Robbert Haarman

2022-10-22

Automatic memory management for small-memory computers


Introduction

How much code does it take to implement a garbage collector? It depends, of course, on the requirements. The Itty-Bitty Garbage Collector (IBGC) is a garbage collector designed for systems where available memory is limited (perhaps a few tens of kilobytes) and we want to limit both the collector's memory overhead and its code size. The current proof-of-concept implementation takes about 120 lines of C code (per SLOCCount), supports objects of multiple sizes, and manages allocated memory cells using pre-allocated tags, of which it uses 3 bits per cell.


How Does It Work?

The Itty-Bitty Garbage Collector manages memory as a contiguous array of cells, each (at least) large enough to hold an address. Each allocation spans 1 or more cells. IBGC associates a tag with every cell in the memory it manages. Of each tag, IBGC uses three bits:

  1. A mark bit, used to mark a cell as reachable or unreachable. The mark bit is only significant for the first cell in an allocation.
  2. A pointer bit, which is 1 if IBGC should treat the cell as containing the address of another object.
  3. A continuation bit, which is 1 if the cell is followed by another cell that is part of the same allocation.

In the current implementation, each tag is an 8-bit byte, so that there are bits in the tag which are not used by IBGC. These bits are made available to the program to store additional information. For example, an implementation of a dynamically typed programming language could store type information in a cell's tag.

A program wishing to use the Itty-Bitty Garbage collector begits by calling ibgc_init(). This puts all managed memory on the free list. The program can then allocate memory using alloc(). To reclaim memory that can safely be deallocated, the program must call gc_trace() for every root object. IBGC will then use the mark bit on every allocation that can be reached from the root to indicate that the allocation is reachable. Once all reachable allocations have been determined, the program calls gc_reclaim(), which returns all memory in unreachable allocations to the free list. Adjacent spans of free memory are merged together. Finally, the program must run mark_tag ^= MARK_MASK, which inverts the meaning of the mark bit, ensuring that all allocations are marked as unreachable before the next call to gc_trace().

Tracing starts at the allocation whose address is passed to gc_trace(). It looks at every cell in the allocation. If the pointer bit is set for the cell, the pointed-to allocation is also traced. This is done without a recursive call to gc_trace(); instead, pointer reversal is used, temporarily replacing the values in the cells being traced with values that allow us to find our way back. The continuation bits in the tags are used to determine when we are done tracing an allocation. The exact details are described in a comment in the source code.


Code

The source code for IBGC is available from its Git repository. You are free to use, modify, and distribute it subject to the terms of the MIT license. Please see the README for instructions on how to use IBGC.


Future Work

The current implementation uses 32-bit cells and 8-bit tags, with 1 bit of the tag freely usable by the program. This is definitely not the most space-efficient arrangement possible, particularly given that IBGC needs only 3 bits for the first cell of an allocation and 2 bits for subsequent cells. Some possibilities to explore include:

  • 16-bit cells and 4-bit tags.
  • 32-bit cells and 8-bit tags, with 5 bits available to the program.

Related Work

SectorLISP's ABC Garbage Collector

The ABC Garbage Collector in SectorLISP takes up just 40 bytes of 8086 machine code. Cons cells are allocated by decrementing a cons stack pointer. Garbage collection works as follows:

  1. Before evaluation, the cons stack is at position A.
  2. After evaluation, the cons stack is at position B.
  3. Copy all objects reachable from the evaluation result that are between A and B. This moves the cons stack pointer to position C.
  4. The region between A and B is now free. Copy the objects between B and C into this region. Set the new cons pointer to A - B + C.

SectorLISP imposes a number of restrictions to make this work. First of all, all objects that are subject to garbage colection are cons cells (SectorLISP has one other type called atom, which exists in a different part of memory and does not participate in garbage collection). Secondly, cons cells are immutable. This means it is impossible to make previously allocated cons cells point to newer cons cells. This is why only conses between A and B have to be copied and traced, and C - B cannot be greater than B - A.

GC in 50 lines

Jason Orendorff's GC in 50 lines provides a garbage collector which fits in fewer than 50 lines of C++ code (42, according to SLOCCount). That includes code which will automatically collect garbage if an allocation is attempted while no memory is available (which IBGC does not do). To get this small, GC in 50 lines imposes the limitation that all objects must be of the same size and type:


struct Object {
    Object* head;
    Object* tail;
    bool marked;
};

Here, head and tail may be either a pointer to another Object, or nullptr. This avoids the need to distinguish pointer values from non-pointer values: every value is a pointer.

Robert van Engelen's Tinylisp

Robert van Engelen's Tinylisp contains a gc function that is just a single line, which sets Lisp's stack pointer to point at the current global environment. The global environment is a list which exists on the stack, as do all Lisp values. Because Tinylisp does not support mutation of lists, the global environment can only contain references to datums that exist above it on the stack, and therefore will be preserved by gc. Because Tinylisp's gc does no tracing, it must be run after every evaluation, else garbage will accumulate that will never be collected.

Valid XHTML 1.1! Valid CSS! Viewable with Any Browser