NAME

src/gc/incremental_ms.c - Incremental mark and sweep garbage collection

DESCRIPTION

The following comments describe a new garbage collection scheme for Parrot.

The scheme of this algorithm is described in the literature with these keywords:

 - non-copying, mark & sweep
 - incremental
 - realtime
 - incremental update with write barrier

Further we might try this optimization

 - treadmill optimization or
 - implict reclamation

Drawbacks of the current mark and sweep collector.

 * can take arbitrary time to complete (1s for 1 Meg objects)
 * can't be used in multi-threaded Parrot
 * works fast for plain (non-aggregate) objects but suffers badly
   for nested aggregates or HLL objects
 * the sweep phase takes time proportional to the allocated storage

INCREMENTAL GARBAGE COLLECTION

Terms

object
An item like a buffer header or a PMC which is managed by Parrot's dynamic memory system.
aggregate
An object that possibly holds references to other objects. For example an array, hash, or reference PMC.
tri-color marking
All objects have one of three colors: white, grey, or black.At the beginning of a GC mark run all objects are white (not yet visited). During marking objects are greyed (visited - found alive), but their contents isn't yet scanned. A fully scanned grey object gets finally colored black. It will not again be rescanned in this run.Only aggregates can be grey, non-containers are blackened immediately.Objects on the free-list are sometimes denoted having the color off-white or ecru.
GC
In Parrot tree the copying garbage collector that recycles string and buffer memory. Configure.pl has a switch to use a malloc library instead, which makes string and buffer memory non-moving.
collector
The reclamation system.
mutator
The normal operation of the program which may or may not change the collectors view of objects.
incremental
Garbage collection and normal program operation is interleaved. This guarantees short and bounded pause times. Garbage collection doesn't significantly interrupt program execution, collector and mutator are running pseudo-parallel.
root set
All structures in the interpreter that might point to objects. E.g. stacks, globals, and of course the registers. All objects the interpreter works with, are directly or indirectly reachable starting from the root set.
the tri-color invariant
At no time a black object may reference a white one directly. Actually this is the strong incarnation of the invariant - all paths from black objects to white objects lead over at least one grey object.The weak tri-color invariant is: there is at least one such path to a white object, so that it's reachable.The strong invariant is the basic idea of mark and sweep too. But as the mutator isn't running during mark phase, the invariant is never violated.Due to this invariant, after the root set has been marked and when all greyed objects are marked (blackened), the white objects have to be dead.
paint it black
Or, which color do new objects have?Actually this should be tunable. Or it depends. If objects are born white and die immediately, they get collected in the same GC cycle. OTOH when these objects are stored into an existing (black) array, we have to do more work to keep the tri-color invariant valid.Anyway, when allocating new objects white, the collector must run more often or must do more work per increment to make the algorithm stop somewhen.
write barrier
To keep the tri-color invariant valid all pointer stores into black objects have to be tracked. If a white object would be stored into a black array, and this object isn't refered to by another object it would get collected. The write barrier greys the white object, so that it get scanned later or alternatively greys the aggregate for a rescan. The latter can be better, if a sequence of such stores would happen.

Data structure overview

The incremental mark and sweep collector has an additional structure in the arena_base that keeps track of the collector's state. Pool and arena structures are unchanged. Only the allocation of new arena blocks is done much more fine grained in e.g. 8K blocks.

Implicit reclamation (optional)

from-space
The graph of all objects found live during the last collection.
to-space
The work area of the collector. During marking live objects are "moved" from the from-space into the to-space. This is the same as the text_for_GC list used in src/gc/api.c. The to-space is initially empty. During marking it gets greyed and finally all reachable objects are black.
free-list
New objects are allocated from the free-list. The free-list is adjacent to the to-space. Allocating a new objects thus means, moving the free pointer one word forward and paint the new object black.

All objects get two additional pointers (forward, backward) and are arranged like in this scheme:

    <-- allocation direction         marking -->
            |                          |
  [w] <--> [w] <--> [b] <--> [b] <--> [g] <--> [g] <--> [w] <-> [w]

            ^        ^                 ^                 ^
            |        |                 |                 |
   free-list-ptr     to-space          scan-pointer      from-space

Objects get "moved" during collection by rearranging the doubly-linked object pointers. At the end of a mark run (when the last grey object is blackened), the from-space and the free-list are merged serving as the new free-list of the next GC cycle. This operation is just a few pointer manipulations that replaces the sweep phase of a mark and sweep collector.

Phases of operation

a) initialization
After interpreter creation the GC system is initialized by marking parts of the root set (globals, internal structures).
b) program operation
For each bunch of allocated objects (A) the collector does k.A work, for some constant k > 1. As new objects are allocated black the number of whites is reduced steadily. This means that the throttle factor k could be less then one too, but this could highly increase average memory usage.To keep the memory usage limited k > 1 must hold.
c) near the end of a mark phase
The rest of the root set is scanned, i.e. the registers. By deferring scanning of registers all temporaries that might have exist somewhen just stay unscanned - they will be collected in this mark phase, if we allocate new objects white or in the next mark phase.
d) finishing a mark phase
The current sweep of the whole arena is done, or with implicit reclamation:Garbage gets appended to the free-list by merging the unscanned from-space with the free-list, these objects are all considered white. All other items are in the to-space and are black. These objects constitute the from-space of the new collection cycle.Now he meaning of the black bit is reversed effectively setting the new from-space to white.The next mark phase is initialized in one step a) and the new cycle starts.Alternatively the mutator could run and allocate objects for some time, without starting the collector again, if there are plenty of free objects on all free-lists.
e) collect buffer memory
Finally, we might trigger a collect run on string and buffer memory if there is an impending shortage of resources. While the copying compactor is rather independent of the collector that cleans object headers, it's more efficient to collect buffer memory when the live information is accurate. This avoids copying of dead buffer memory.

Comparison with our current mark and sweep collector

  MS  ... mark and sweep (stop-the-world)
  IMS ... incremental mark and sweep
  IMIR .. incremental mark implicit reclamation

                       MS                 IMS               IMIR
  ------------------------------------------------------------------------
  operation            stop-the-world     incremental       incremental
  time per mark phase  unbounded          bounded           bounded
  size overhead        1 word             1 word            2 words
  time overhead        O(2*live + dead)   O(2*live + dead)  O(live)   2)

Notes:

  2) it should be possible to mark containers at once by using the
     information of the from-space pointers and tracking changes
     to the aggregate.

Implementation details and unsorted remarks

the object graph
The MS and IMS scheme use the next_for_GC pointer for keeping track of references. This interferes with the freeze functionality, which can use the same pointer to keep track of visited objects.IMIR has a dedicated pointer pair to build the object graph.
Greying objects
Greying objects is done depth-first. This has much better cache locality then visiting an object again much later. In the picture above this means that grey objects are inserted at the left end of the mark chain immediately to the right of the object that gets blackened.
big aggregates
Greying has to be done in increments. Big aggregates can't have a mark vtable that could run arbitrarily long. This means that the GC system must know the layout of arrays, hashes, and objects. This is currently true for arrays and objects but not for hashes. But the latter need some refactoring of internals anyway.To avoid visiting all aggregate elements, it could be better to track the graph of old aggregates by using a write barrier for all writes into the array. This would basically create a generational collector. The old generation (the aggregate) isn't scanned. But changes to this "old generation" are tracked and reflected in the collectors graph of objects.
timely destruction
The interpreter arena has a count of currently active objects that need timely destruction. When during scope exit an high priority sweep is triggered, we have basically two cases:1) all of these objects were already seen in this mark phase - the scope exit can continue.2) Not all objects were seen - they might be alive or not. This means that the mark phase must run to the end to decide, if these objects are alive (or again until all are found alive).To increase performance its likely that we need some additional information that keeps track of the location of such objects and just try to mark paths to objects that need timely destruction.
concurrent or parallel collection
As the described algorithm is already incremental its well-suited for parallel collection in a multi-threaded Parrot. The work of greying objects can be done in parallel by atomically handling a bunch of objects to another thread. After doing some increments of marking, these objects then get returned to the shared to-space. The parallel collection is finished when the last object is blackened and all threads have reached the thread barrier rendezvous point. (Please note the very different meaning of barrier here).But also a single-threaded Parrot can vastly take advantage by running increments of the collection during waiting for I/O completion or during a sleep opcode.

FUNCTIONS

static void gc_ims_add_free_object
Add object to_add to the free_list in the given pool. pool-num_free_objects> has to be updated by the caller.
static void *gc_ims_get_free_object
Get a new object off the free_list in the given pool.
static void gc_ims_alloc_objects
Allocate new objects for the given pool.
static void gc_ims_pool_init
Initializes a pool by setting the appropriate function pointers to add, get, and allocate objects.
static void parrot_gc_ims_deinit
Shuts down this GC system.
void Parrot_gc_ims_init
Initialize the state structures of the gc system. Called immediately before creation of memory pools. This function must set the function pointers for add_free_object_fn, get_free_object_fn, alloc_objects_fn, and more_objects_fn.
static void parrot_gc_ims_reinit
Reinitialize the collector for the next collection cycle.
static void parrot_gc_ims_mark
Mark a bunch of children.The work depends on item counts with and without a next_for_GC field. The former are marked immediately, only the latter need real work here.
static int sweep_cb
Callback to sweep a header pool (see Parrot_forall_header_pools).
static void parrot_gc_ims_sweep
Free unused objects in all header pools.TODO split work per pool.
static int collect_cb
Callback to collect a header pool (see Parrot_forall_header_pools).
static int parrot_gc_ims_collect
Run the copying collector in memory pools, if it could yield some free memory.
static void parrot_gc_ims_run_increment
Run one increment of collection. This function is triggered by object allocation.
static void parrot_gc_ims_run
Interface to Parrot_do_gc_run. flags is one of:
  GC_lazy_FLAG   ... timely destruction
  GC_finish_FLAG ... run until live bits are clear
void Parrot_gc_ims_wb
Write barrier called by the GC_WRITE_BARRIER macro. Always when storing a white object into a black aggregate, either the object must be greyed or the aggregate must be rescanned -- so grey it.

SEE ALSO

src/gc/api.c, include/parrot/gc_api.h, include/parrot/pobj.h,

HISTORY

Initial version by leo (2004.08.12 - 2004.08.15)