Rust GC Design

TODO

GC History

The role of garbage collection (GC) in Rust has been heavily revised during the language's design. It was originally deeply integrated into the language, complete with dedicated syntax; then over time we found ways to lessen the dependency on GC, and then finally remove it from the language entirely.

Why support GC

There remain reasons to support GC in Rust.

Gc versus Rc

Libraries have largely replaced uses of GC with reference counting (e.g. via smart pointers like Rc<T> and Arc<T>). But reference counting is inherently prone to leaking of cyclic structure, and incurs a ref-count maintenance burden. Regarding the latter:

there is runtime overhead maintaining the reference-counts, and
there is also an ergonomic burden for the programmer: the associated smart-pointers (e.g. Rc<T>) cannot implement Copy, so uses of Rc<T> in Rust is often peppered with calls to the .clone() method, making such references in Rust less ergonomic than a well-integrated Gc<T>.

Interop wants GC support

An important domain for Rust is integrating a Rust library or program with some other application framework. An obvious example of this is Servo's use of the SpiderMonkey Virtual Machine for its Javascript support.

Servo is relying on SpiderMonkey's garbage collection for memory management, not only for Javascript values, but even for native-code DOM objects.

As described in that post (in the section entitled "Custom static analysis"), one can copy an unrooted pointer onto a local variable on the stack, and at that point, the SpiderMonkey garbage collector no longer knows about that root.

However, proper GC support in Rust would include a stack tracing mechanism for discovering roots on the stack. This mechanism could be extended to allow such root-discovery to be communicated to other collectors such as SpiderMonkey's.

Integrating Rust and GC

So, let us assume that we do indeed want garbage collector support in Rust.

The GC API has distinct audiences

I only recently have come to realize that the use cases outlined above come from two fundamentally different audiences.

The first audience is made up of Rust application and library developers, who want garbage collection in Rust to expose them solely to an ergonomic (and efficient) smart-pointer. This group wants to call Gc::new(T) (instead of Rc::new(T)), and wants the use of Gc<T> to be essentially just as if they were using a special variant of the reference type &T whose lifetime was magically guaranteed to last as long as necessary.

The second audience is made up of Rust integrators, who are seeking to deploy or embed Rust in contexts where memory is already being managed by some external library that is not attempting to integrate with the Rust language and runtime. The primary concern for this group is that their memory management system is supported at all.

The reason that these audiences are distinct comes down to the requirement from the first audience that use of Gc<T> be ergonomic.

Ergonomics, Deref, and Object Pinning

To provide programmer ergonomics, the main trick that smart pointers like Rc<T> use is an implementation of the Deref trait:

impl<T> Deref for Rc<T> where T: ?Sized { type Target = T; fn deref<'a>(&'a self) -> &'a T { ... } }

With this implementation in place, the overloaded dereference operator *expr is extended with an implementation that, given an &Rc<T> will yield a &T. This may not sound like much on its own, but it indirectly provides two programmer conveniences:

"dot-operations" (i.e. method calls recv.m(..) and field dereferences recv.f) will automatically dereference the recv expression as much as necessary to find the associated method or field.
coercions from &S to &T will automatically dereference the S as much as necesasry to extract a &T.

(For more discussion, see TRPL.)

Support for the above API is, as far as I can tell, a strict requirement for an ergonomic Gc<T> in Rust today.

(In other words, if Gc<T> cannot implement Deref, then only other option is to extend the Rust language with some alternative overloading of *expr that Gc<T> can support. This may be an interesting avenue to explore in the long-term, but is not a realistic short-term solution.)

But, supporting Deref poses a problem for some memory management strategies, and thus may exclude some Rust integrators.

One important goal for garbage collection in Rust is that it should support "copying" or "compacting" garbage collectors that move some objects in memory.

Copying GC

In a fully-copying GC, the collector may freely rearrange objects in memory. The address assigned to an object may change whenever control is yielded to the collector (e.g. via a GC allocation request or via all threads reaching a GC safepoint). Relocating an object from one location to another is often called "forwarding" that object. A collection that forwards all live objects is called a "full compaction."

In a partly-copying GC, some objects may be forwarded, but the collector is designed so that on every collection, there may exist some objects that cannot be forwarded. Some such systems support dynamically shifting objects to and from non-relocatable status; we refer to marking an object as non-relocatable for a limited dynamic extent as "pinning" the object.

There are a fair amount of details, but the most important thing is to allow scenarios like this:

before garbage collection:

STACK            GC-HEAP (pages start at e.g. address 0x100000)
                +------+
root A -------> A   D ----+
                |      |  |
                |   C -------+
                |      |  |  |
                +------+  |  |
                          |  |
                +------+  |  |
                B      |  |  |
                |      |  |  |
                |"datb"|  |  |
                +------+  |  |
                          |  |
                +------+  |  |
            +-> C      |<----+
            |   |      |  |
            |   |"datc"|  |
            |   +------+  |
            |             |
            |   +------+  |
            |   D      |<-+
            +----- C   |
                |"datd"|
                +------+

after garbage collection:

STACK            GC-HEAP (pages start at e.g. address 0x200000)

                +------+
root A'-------> A'  D'----+
                |      |  |
                |   C'-------+
                |      |  |  |
                +------+  |  |
                          |  |
                +------+  |  |
                D'     |<-+  |
            +----- C'  |     |
            |   |"datd"|     |
            |   +------+     |
            |                |
            |   +------+     |
            +-> C'     |<----+
                |      |
                |"datc"|
                +------+

In this picture, the collector has started from the root that points to the object "A", and then gone from there to discover the objects "C" and "D" that "A" points to. It copies the objects to new locations in memory (thus the picture added prime marks to all such relocated objects, "A'", "C'", and "D'"), and also reclaimed the memory associated with the object "B", which was unreachable.

(The picture notes that the GC-HEAP here may start at an entirely different address, since the act of copying the memory may have involved moving all of the reachable storage to new memory pages, and then freeing the old pages.)

The Problem with Deref

The problem with a picture like the above is: how do we support the Deref trait here?

An implementation of the Deref trait for Gc<T> will require that we temporarily acquire references &T that point directly into the GC heap.

So if we start with something like this:

STACK            GC-HEAP (pages start at e.g. address 0x100000)
                +---------+
 Gc(A) -------> A  Gc(D) ----+
                |         |  |
                |  Gc(C) -------+
                |         |  |  |
                +---------+  |  |
                             |  |
                +---------+  |  |
                B         |  |  |
                |         |  |  |
                |  "datb" |  |  |
                +---------+  |  |
                             |  |
                +---------+  |  |
            +-> C         |<----+
            |   |         |  |
            |   |  "datc" |  |
            |   +---------+  |
            |                |
            |   +---------+  |
            |   D         |<-+
            +----- Gc(C)  |
                |  "datd" |
                +---------+

We might very well end up accessing any of the Gc<T> pointers in the above picture via Deref, extracting a &T. For example, we might extract the second field of the "A" object and deref it to a reference to the object "C":

STACK            GC-HEAP (pages start at e.g. address 0x100000)
                +---------+
 Gc(A) -------> A  Gc(D) ----+
                |         |  |
  &C ---+       |  Gc(C) -------+
        |       |         |  |  |
        |       +---------+  |  |
        |                    |  |
        |       +---------+  |  |
        |       B         |  |  |
        |       |         |  |  |
        |       |  "datb" |  |  |
        |       +---------+  |  |
        |                    |  |
        |       +---------+  |  |
        +-> +-> C         |<----+
            |   |         |  |
            |   |  "datc" |  |
            |   +---------+  |
            |                |
            |   +---------+  |
            |   D         |<-+
            +----- Gc(C)  |
                |  "datd" |
                +---------+

The problem is that when the garbage collector collector now runs, it would not be safe to try to move the object "C" to a new location.

Keep in mind that the &C above may in turn be cast to a *C or even a usize. One fundamental principle that I want to preserve is what I call "The Frozen Reference Property": library code can safely assume that a reference &'a T can be treated as accessible memory of type T, with a constant address for the extent of the lifetime 'a.
We could also consider lifting the Frozen Reference Property. This would allow us to write pointers of the form &T. But as Niko noted during the whistler work week:
- Doing this would require that we tell LLVM to not store such pointers in registers during any extent where a GC might occur.
- And it would make casting &T to *T even more unsafe than it already is
- note that in Rust today we do not actually require an unsafe{.rust} block for such casts!
Note also that references migrates can be sent between threads (if their contents are Sync).

Using Pinning to solve the Deref problem

However, it would be sound, in the above scenario, to leave "C" where it was, but copy the other reachable objects to new locations

STACK            GC-HEAP
                             +---------+
 Gc(A') -------------------> A' Gc(D') ---+
                             |         |  |
  &C ---+                    |  Gc(C)  |  |
        |                    |         |  |
        |                    +---------+  |
        |                                 |
        |                    +---------+  |
        |                    D'        |<-+
        |                    |  Gc(C) ------+
        |                    |  "datd" |    |
        |                    +---------+    |
        |                                   |
        |       +---------+                 |
        +-----> C         | <---------------+
                |         |
                |  "datc" |
                +---------+

This act, of forcing the object "C" to keep its current location, is known as "pinning".

Pinning provides a convenient way for us to support the Deref trait on Gc<T> in Rust: the deref() method on Gc would pin the referenced object on the GC heap, and return a smart pointer GcRef<T>. GcRef<T> would also implement Deref (and would return &T), but GcRef<T> would also carry a destructor that unpinned the referenced object. The &T provided by deref'ing GcRef<T> would not be allowed to outlive the GcRef<T> itself, so this would ensure that values remain pinned for the extent of the time that a &T has been lent out.

Support for pinning is provided by some but not all garbage collectors.

Smart Pointer: No (one) silver bullet

As discussed in the previous section, Deref-support is a likely requirement for ergonomic Gc<T>.

But, some garbage collectors do not support pinning of arbitrary objects. For example, the SpiderMonkey garbage collector does not support pinning.

Some reasons pinning is sometimes unsupported:

It is useful in a generational collector to be able to assume on every collection that the nursery area (where most objects are initially allocated) will always be entirely evacuated.
A bump-pointer allocation scheme that assumes that the allocation buffer ends at a fixed location is incompatible with a system that allows arbitrary objects to be pinned to their current memory location. (Note that not all bump-pointer schemes need be so restrictive.)

So, does this rule out Rust support for SpiderMonkey and other GC's?

My answer is: No, it just rules out ergonomic support for such collectors.

We are not required to provide a single type that is meant to be all things to all GC's.

In particular, I recommend that we provide (at least) two types.

(These types may be structs that Rust provides in the standard library or a gc crate, or they may end up being traits that the GC implementations are expected to implement and provide via some associated type. Time will tell.)

For Ergonomics: Gc

For the audience desiring ergonomics, we provide Gc<T> that supports the Deref trait as described above. It will only work with some GC implementation strategies. We can provide a reference implementation of one that supports the key feature (pinning) while also exercising object-relocation as much as possible (in order to try to flush out bugs where unsafe-code is holding on to references into the GC heap via unsafe pointers).

For Interop: VolatileGc

For the audience desiring interop, we provide VolatileGc<T> that does not support Deref, and in fact has a number of other restrictions.

struct VolatileGc<T> { ... }
impl VolatileGc<T> {
    /// Returns a pointer to the `T` that is allocated on the GC heap.
    /// The caller *must* ensure that no garbage collector activity
    /// occurs while the returned value is still in use.
    ///
    /// In particular,
    /// the caller must ensure that no GC-allocation takes place
    /// (since such allocation could cause a garbage collection).
    ///
    /// Depending on the GC details, the caller may also need to
    /// ensure that no GC read- or write-barriers are crossed
    /// while the returned value is still in use
    /// (since in some GC's that may cause a garbage collection).
    ///
    /// Finally, depending on the GC details, (namely whether it
    /// is a GC managing memory for multiple threads), the caller may need
    /// to ensure that no so-called "safepoints" are crossed
    /// (because doing so may cause a garbage collection
    /// to be initiated across the coordinating threads).
    fn ptr(&self) -> *mut T
}

This type is clearly very restricted in utility. It also may be difficult to enforce the described requirements (e.g. higher-order procedures may cause an allocation while there is still a *mut T to the GC heap in use).

In the long term Rust may want to add an effect system, even just one that integrates with the lint system, to try to detect errors of this kind.

In the short term, calls to the ptr method can be hidden behind abstraction barriers that enforce the requirements, for example as sketched below:

struct List<X> { car: X, cdr: Option<VolatileGc<List<X>>> }

impl<X> VolatileGc<List<X>> {
    fn car(&self) -> X where X:Copy {
        unsafe { (*self.ptr()).car }
    }
    fn cdr(&self) -> Option<Self> {
        unsafe { (*self.ptr()).cdr.clone() } // (see below)
    }

    fn set_car(&self, x: X) {
        unsafe { (*self.ptr()).x = x; }
    }
    fn set_cdr(&self, y: Option<Self>) {
        unsafe { (*self.ptr()).y = y; }
    }
}

Side Question: Should VolatileGc implement Copy? It would preclude e.g. a backing collector that is implemented via a reference-counting system. But I think we need to draw the line somewhere, and I am willing for that line to include the condition "if you are a Gc, your references must themselves be copyable."

Integrating and Insulating the GC from other Rust APIs

Perhaps the previous section jumped the gun a little bit, diving into the details of what kind of smart-pointer to provide without providing any perspective of how this all fits into Rust as a whole.

Let me try to correct that now.

How GC works in the abstract

GC-allocated memory is often described as an "object graph": GC-tracked references are edges, and the nodes are objects (values allocated on the GC heap) plus a special set of nodes (the "root set") representing the processor registers and stack(s).

The task of a garbage collector is to identify a connected component that includes the root set, and then discard the remaining heap-allocated objects (aka the "garbage"). The "reachable objects" make up the smallest such connected component; in the general case a given collection may retain a larger connected component because it is expensive to identify the smallest one. Any objects that are unreachable but are not discarded are "floating garbage" that has survived the collection attempt.

A garbage collector typically works by doing a graph traversal, starting from the root set, and then working its way across the reachable nodes on the GC heap. Each time it reaches a node, it needs to identify its outgoing edges: this means it needs to be able to extract the GC-tracked references for any object on the GC heap.

Breaking it down

So, from the above description, we can identify two important steps:

Identifying the root set to start off any GC
Given a GC-heap allocated object, trace through its GC-tracked references

We will discuss the second step first, as the first step will end up having some relevant overlap.

Tracing through the GC-heap

The tracing process will be responsible for extracting the Gc-refs within an GC-heap allocated object (and potentially rewriting them to reflect an object's new location due to a copy)

In the common case we would like for this code to be automatically generated by the compiler based on its type structure. I.e. given a struct definition:

struct List<X> { car: X, cdr: Option<Gc<List<X>>> }

when gc-support is enabled, the compiler should generate code to do the following on an instance of List<X>, where X is traceable:

impl<X> Trace for List<X> {
    fn trace(&mut self, gc_state: &mut GcState) {
        self.car.trace(gc_state);
        match *self {
            None => {}
            Some(ref mut g) => gc_state.enqueue(g),
        }
    }
}

(We can probably reuse much of the derive machinery for this.)

To be clear, here is what I am envisioning for the gc_state's enqueue method:

impl GcState {
    /// Marks `gc_ref` as reachable; if this is the first time we have
    /// encountered it, enqueues it for future tracing,
    ///
    /// Note that "marking" `gc_ref` may actually imply forwarding its
    /// associated heap-object to a new location, or, if it was already
    /// forwarded, may imply rewriting it to point to its new home.
    fn enqueue<T>(&mut self, gc_ref: &mut Gc<T>) { ... }
}

We should also allow a user to provide their own implementation (and potentially override the automatically generated one) since some types will use reprsentations like *const X where we will not be able to derive the tracing strucure properly.

Identifying the Root Set

Identifying the root set is an interesting problem, because we should allow references like Gc<T> to be held in boxed objects, like Box<Gc<T>> or Vec<Gc<T>>.

This means that it would not suffice to simply scan the stack to find the root set, as one might think from skimming the pictures used above to illustrate copying gc.

A more realistic picture looks like this:

STACK            RUST-HEAP            GC-HEAP
                                     +---------+
 Gc(A) ----------------------------> A         |
                                     |  Gc(C) -----+
                                     |         |   |
                                     +---------+   |
                                                   |
                 +-Box-----+                       |
Box(B) --------> B         |         +---------+   |
                 |  Gc(C) ---------> C         | <-+
                 +---------+         |         |
                                     |  Gc(X) -----+
            +-------------------------- Box(O) |   |
            |                        |         |   |
            |                        +---------+   |
            |                                      |
            |                        +---------+   |
            |    +-Vec-----+         G         |   |
            |    V         | <--------- Vec(V) |   |
            |    |         |         +---------+   |
            |    +---------+                       |
            |                                      |
            |    +-Box-----+                       |
            +--> O         |         +---------+   |
                 |  Gc(X) ---------> X         | <-+
                 +---------+         | "datae" |
                                     +---------+

In the above picture, the root set needs to include both the Gc(A) on the stack and the Gc(C) held in the Box(B).

The garbage collector is not responsible for directly managing the memory in the column labelled "RUST-HEAP." Those boxes will be handled by the system allocator. So, regardless of whether the GC copies objects, those boxes should stay in the same place.

What the GC may need to do, however, is rewrite the Gc-references in the boxes (and the Gc(A) stack as objects are copied. (This seems reasonable, as long as none of the Gc<T>'s in question have curently lent-out any &T-references, as discussed above.

(In addition, the GC, or some other entity, will need to run Drop code when it collects objects. For example, after determining that the object "G" on the GC-heap above is garbage, we will also need to call Vec::drop to deallocate the object labelled a"V" that "G" owns on the Rust-heap.)

A further twist is added when we consider that in general, the references on the stack will not have nice types like Box<B>. In general unsafe code may only be holding a *mut B or *const B that it is planning to later convert back into a Box<B>. Therefore we must also handle cases like this:

STACK            RUST-HEAP            GC-HEAP

                 +-Box-----+
addr B --------> B         |         +---------+
                 |  Gc(C) ---------> C         |
                 +---------+         |   ...   |
                                     |         |
                                     +---------+

In any case, if we assume that we can somehow trace values (on the stack, on the Rust-heap, and on the GC-heap), and we impose a restriction that Gc<T> values are only allowed to be written into memory that is managed by Rust, then we should be able to extract all of the roots in a picture like the above.

Tracing through the Rust-heap

We control the Rust memory allocator.

Some clients will want to have direct access to malloc and free (or rather, their jemalloc-equivalents that we expose), and we can provide that to those clients, assuming they are willing to forego Gc<T> support (or are willing to jump through other hoops to get it back; see below)

But in general, most clients will be happy using higher-level API's, i.e. something along the lines of the typed_alloc module of RFC PR 244.

If we start with the assumption that all data on the Rust-heap is going through a high-level API like that, then the runtime can store meta-data mapping every address to the allocated block and an associated root-extractor.

The root extractor is much like a tracer, but is solely in charge of finding instances of Gc<T> and enqueuing them in the GcState as outlined above.

This should suffice for cases such as outlined in the picture:

STACK            RUST-HEAP            GC-HEAP

                 +-Box-----+
addr B --------> B         |         +---------+
                 |  Gc(C) ---------> C         |
                 +---------+         |   ...   |
                                     |         |
                                     +---------+

Perhaps we should just have one Trace trait that serves for both root-extraction (on the stack and Rust-heap) and for tracing the GC-heap itself. I have not thought too carefully about that yet.

Internal Question: address on Rust-heap

Internal Question: assuming we do indeed trace through raw addresses on the stack (a detail that in the past some people have claimed we can avoid, but I'm going to assume it for now), will we also trace through raw addresses that we encounter in objects on the Rust heap? In other words, are we planning to handle this case:

STACK            RUST-HEAP            GC-HEAP

                 +-Box-----+
addr A --------> A         |
                 | addr B -----+
                 +---------+   |
                               |
                 +-Box-----+   |
                 B         | <-+     +---------+
                 |  Gc(C) ---------> C         |
                 +---------+         |   ...   |
                                     |         |
                                     +---------+

I currently assume the answer to this question is "yes".

But a reasonable person might argue that our reasons for tracing through raw pointers on the stack do not extend to arbitrary Rust heap-allocated values. And that answer is probably fine too.

indeed, while tracing the stack might be reasonable fast, tracing the full tree structure implied by following raw-addresses on the Rust-heap may be unreasonably slow.

Anyway, we just need to make sure our story is consistent here.

Tracing through Trait Objects

One detail we have not yet addressed is how to handle trait-objects, whose underlying data-structure is not exposed in their static type.

Presumably Gc-allocated trait objects will have some type of the form Gc<Trait>. The GC allocator can ensure that whenever an object is allocated on the GC heap, it includes a word for method dispatch on the allocated value itself; this will handle tracing instances of Gc<Trait>.

Another problem is how to deal with trait objects on the Rust heap. They too may contain Gc<_>-references. However, as far as I can tell, we should be able to use the same allocator meta-data that is used to resolve arbitrary addresses, as shown in Tracing through the Rust-heap.

During the whistler work week, there seemed to be some potential concern that "all objects must be treated as if they contain roots".
Presumably the issue here is that we do not want to waste time tracing through the Rust heap in a vain search for a Gc<_>-ref that does not exist on a chain of trait-objects.
However, I do not see a way around this.
In any case, we will probably employ tricks in the allocator meta-data to ensure that values with types that cannot ever contain Gc-roots are tagged so that any attempt to trace them immediately returns; such tricks should work for trait objects, at least at the first level.
- (It does not resolve the chain of trait-objects problem, though, since an inner reference to a trait-object will cause the container to be flagged as potentially holding a GC-root.)

Opt-in or Opt-out Trace bound

There was the condition "where X is traceable" on the above text. I chose not to encode this via a bound like impl <X:Trace>, because my assumption is that if we are to have any hope of GC-support actually being used by Rust code in the wild, then support-for-tracing needs to be something that gets added to standard data-types without the original designer having to account for it in their design.

In other words, Trace could and should be a defaulted trait that is automatically added to type parameters, analogous to the Sized bound.

As a counterpart to this, we will also have a defaulted Opaque trait bound that enforces the contraint that if X: Opaque, then X cannot own any Gc<_> values. This means that such an X: Opaque is trivially traceable (you just skip it).

So, when you are implementing a generic container type Container<X> and you do not want to think about the case where X = Gc<T>, you can impose the bound: Container<X:Opaque>. Then a trivial Trace implementation suffices for Container<X>.
We might also allow a library to generalize their type parameters (to allow types that do not implement trace-support) in the same manner that is done for the Sized bound:
```
impl <X: ?Trace> Foo<X> { ... }
```
Of course we hope that libraries will not find this necessary.

Implementation and Delivery strategy

I suspect that the is overlap between the two types outlined in Smart Pointer: No (One) Silver Bullet. For example, I see no reason that the Gc<T> type (or trait) should not itself extend (or implement) the VolatileGc<T> type (or trait). Having some sort of subtyping or coercion relationship between the two types may help people who want to write maximally-generic code, even if doing so is extremely unergonomic.

However, I also suspect that the main client of VolatileGc<T>, for the short term, is Servo and SpiderMonkey. That client has no need to work with stable Rust. Therefore, the VolatileGc<T> type could probably remain unstable for a long time, which would give us room to iterate and refine its design.

I do not know if the same story holds for Gc<T>. I suspect if we do not stabilize it as soon as we can, then we may have problems with unsafe native code making assumptions about their input types that simply do not hold. (But then again, we may already partway down this hole anyway, and thus it may not matter how quickly we attempt to stabilize this type and provide a reference implementation.)

In any case, I recommend that we attempt to develop the reference collector that integrates with Gc<T> in parallel with the VolatileGc<T> support for SpiderMonkey, to guard against building in assumptions about the tracing mechanism that are tied to one collector or the other.

pnkfelix/gc-notes.md