The Itty-Bitty Garbage Collector
2022-10-22
Automatic memory management for small-memory computers
Introduction
How much code does it take to implement a garbage collector? It depends, of course, on the requirements. The Itty-Bitty Garbage Collector (IBGC) is a garbage collector designed for systems where available memory is limited (perhaps a few tens of kilobytes) and we want to limit both the collector's memory overhead and its code size. The current proof-of-concept implementation takes about 120 lines of C code (per SLOCCount), supports objects of multiple sizes, and manages allocated memory cells using pre-allocated tags, of which it uses 3 bits per cell.
How Does It Work?
The Itty-Bitty Garbage Collector manages memory as a contiguous array of cells, each (at least) large enough to hold an address. Each allocation spans 1 or more cells. IBGC associates a tag with every cell in the memory it manages. Of each tag, IBGC uses three bits:
- A mark bit, used to mark a cell as reachable or unreachable. The mark bit is only significant for the first cell in an allocation.
- A pointer bit, which is 1 if IBGC should treat the cell as containing the address of another object.
- A continuation bit, which is 1 if the cell is followed by another cell that is part of the same allocation.
In the current implementation, each tag is an 8-bit byte, so that there are bits in the tag which are not used by IBGC. These bits are made available to the program to store additional information. For example, an implementation of a dynamically typed programming language could store type information in a cell's tag.
A program wishing to use the Itty-Bitty Garbage collector begits by
calling ibgc_init(). This puts all managed memory on the free
list. The program can then allocate memory using
alloc()
. To reclaim memory that can safely be
deallocated, the program must call gc_trace()
for every
root object. IBGC will then use the mark bit on every allocation that
can be reached from the root to indicate that the allocation is
reachable. Once all reachable allocations have been determined, the
program calls gc_reclaim()
, which returns all memory
in unreachable allocations to the free list. Adjacent spans of
free memory are merged together. Finally, the program must run
mark_tag ^= MARK_MASK
, which inverts the meaning of
the mark bit, ensuring that all allocations are marked as
unreachable before the next call to gc_trace()
.
Tracing starts at the allocation whose address is passed to
gc_trace()
. It looks at every cell in the allocation.
If the pointer bit is set for the cell, the pointed-to allocation
is also traced. This is done without a recursive call to
gc_trace()
; instead, pointer reversal is used,
temporarily replacing the values in the cells being traced with
values that allow us to find our way back. The continuation bits
in the tags are used to determine when we are done tracing an
allocation. The exact details are described in a comment in
the source code.
Code
The source code for IBGC is available from its Git repository. You are free to use, modify, and distribute it subject to the terms of the MIT license. Please see the README for instructions on how to use IBGC.
Future Work
The current implementation uses 32-bit cells and 8-bit tags, with 1 bit of the tag freely usable by the program. This is definitely not the most space-efficient arrangement possible, particularly given that IBGC needs only 3 bits for the first cell of an allocation and 2 bits for subsequent cells. Some possibilities to explore include:
- 16-bit cells and 4-bit tags.
- 32-bit cells and 8-bit tags, with 5 bits available to the program.
Related Work
SectorLISP's ABC Garbage Collector
The ABC Garbage Collector in SectorLISP takes up just 40 bytes of 8086 machine code. Cons cells are allocated by decrementing a cons stack pointer. Garbage collection works as follows:
- Before evaluation, the cons stack is at position A.
- After evaluation, the cons stack is at position B.
- Copy all objects reachable from the evaluation result that are between A and B. This moves the cons stack pointer to position C.
- The region between A and B is now free. Copy the objects between B and C into this region. Set the new cons pointer to A - B + C.
SectorLISP imposes a number of restrictions to make this work. First of all, all objects that are subject to garbage colection are cons cells (SectorLISP has one other type called atom, which exists in a different part of memory and does not participate in garbage collection). Secondly, cons cells are immutable. This means it is impossible to make previously allocated cons cells point to newer cons cells. This is why only conses between A and B have to be copied and traced, and C - B cannot be greater than B - A.
GC in 50 lines
Jason Orendorff's GC in 50 lines provides a garbage collector which fits in fewer than 50 lines of C++ code (42, according to SLOCCount). That includes code which will automatically collect garbage if an allocation is attempted while no memory is available (which IBGC does not do). To get this small, GC in 50 lines imposes the limitation that all objects must be of the same size and type:
struct Object {
Object* head;
Object* tail;
bool marked;
};
Here, head and tail may be either a pointer to another Object,
or nullptr
. This avoids the need to distinguish pointer values
from non-pointer values: every value is a pointer.
Robert van Engelen's Tinylisp
Robert van Engelen's Tinylisp contains a gc function that is just a single line, which sets Lisp's stack pointer to point at the current global environment. The global environment is a list which exists on the stack, as do all Lisp values. Because Tinylisp does not support mutation of lists, the global environment can only contain references to datums that exist above it on the stack, and therefore will be preserved by gc. Because Tinylisp's gc does no tracing, it must be run after every evaluation, else garbage will accumulate that will never be collected.