Skip to content

Instantly share code, notes, and snippets.

@nikomatsakis
Last active July 14, 2017 11:16
Show Gist options
  • Save nikomatsakis/dfc27b28cd024eb25054b52bb11082f2 to your computer and use it in GitHub Desktop.
Save nikomatsakis/dfc27b28cd024eb25054b52bb11082f2 to your computer and use it in GitHub Desktop.

The plan is to develop non-lexical lifetimes in parallel with borrowck.

The vision for how NLL integrates into the compiler pipeline is that we will move all region processing to occur after MIR construction. This means that the existing regionck code will go away completely (or almost completely). Instead, when we do the first round of type-checking, we will just use "erased" regions ('erased), just as we do in trans. But we don't have to worry about this yet -- we can prototype NLL while still keeping the existing region code in place (in fact, I think we must, so that the compiler keeps working!). What it does mean is that we can start building an NLL inference pass that basically completely ignores the existing regions, except for those in the types of arguments (later we will want to change this: specifically, if the user hand-wrote regions into types of local variables or as casts and so forth, we do need to respect that, but we can ignore this detail to start).

I envision adding a compiler flag -Znll which will enable "non-lexical lifetimes inference". This flag will cause us to insert a MIR pass, probably at the end of the MIR_VALIDATED suite. This new MIR pass -- let's call it nll for now, though I guess eventually we can just call it regionck -- will be performing our region inference, and also processing various correct assertions we can use for unit testing.

A note on terminology: region vs lifetime

You've probably noticed by now that the compiler uses the term region instead of lifetime most of the time. Region is the traditional name in the academic literature. To me, the two are basically interchangeable at this point -- but I'll keep using region since it seems to come more easily when I'm talking about the basic implementation details.

Preparing the MIR

The pass will probably begin by making a clone of the MIR so it can mutate it without disturbing the rest of the compiler. We can then run a variant of the existing erase_regions pass. erase_regions is a MIR visitor that walks over a MIR and (in place) rewrites all the types to remove references to regions and replace them with 'erased. I think what we want to do is to remove all regions and replace them with ReVar instances. To do that, we'll probably have to abuse the existing type and region inference machinery. Let me diverge a bit to explain how that works.

Inference machinery

The compiler allocates types that contain "type inference byproducts" from other types in a separate, temporary arena. This allows us to free all those types once type inference is done. Since we plan to use ReVar regions, we need to setup one of those temporary arenas. Right now, the only way to do that is to create an inference context -- this will in turn create (as a part of that) a region inference context. Of course, we are now in the process of prototyping the replacement for the existing region inference, so it's not entirely desirable to also have the existing region inference floating around -- but actually it doesn't do us any real harm, I don't think.

To create an inference context, you do this:

tcx.infer_ctxt().enter(|infcx| {
    // In here, you have an inference context `infcx`.
    // Once `enter()` returns, the `infcx` will be freed, which
    // means that all the temporary inference types will be freed
    // too.
})

You would basically do this as the very first thing in your inference pass, and the entirety of your inference pass goes inside of that closure. Then when you are done, the closure will exit, and all the temporary memory we've allocated will get freed.

A note on lifetimes

Many types in the compiler have three lifetimes. These are precisely to differentiate between three kinds of data:

  • global data ('gcx, but -- unfortunately -- sometimes referred to as 'tcx, for legacy reasons). This is data that persists for basically the entire compilation (at least up until LLVM kicks in). Examples of such data are the HIR but also global types like i32.
  • inference data (typically 'tcx). This is data that persists just for the inference session (i.e., during that closure we created above).
  • short-lived data (typically just 'a). This is used to represent the lifetime of random references that are stored in the struct and only have to live as long as the struct is in use (they may of course live longer, but the struct doesn't care). Often 'a is the intersection of a lot of random lifetimes.

When we are in the inference context, if we access the tcx (type context) through infcx.tcx, we will get back a type with three lifetime parameters: TyCtxt<'a, 'gcx, 'tcx>. This type is in fact just a couple of references, and looking at its definition may help you to see the three roles more clearly:

pub struct TyCtxt<'a, 'gcx: 'a+'tcx, 'tcx: 'a> {
    gcx: &'a GlobalCtxt<'gcx>,
    interners: &'a CtxtInterners<'tcx>
}

First off, the struct fields are both references, and their lifetimes are both 'a. In this sense 'a doesn't represent the lifetime of any of the data stored in the type context, it just represents the lifetime of our reference to the tcx.

Next, the gcx field refers to a struct (the GlobalCtxt) containing the global data for the entire compilation. You can see it is parameters with 'gcx.

Finally, the interners refers to the interning tables that are specific to this inference session, and you see that it is parameterized by 'tcx.

Back to the rewriting of regions with variables

OK, so, once we've setup our inference context, we can walk over the regions that appear in the MIR and replace them with fresh variables. As I said, this is similar to the existing erase_regions pass, but instead of invoking erase_regions, we would want to write a similar function that replaces regions with fresh variables.

The erase_regions helper is defined in ty/fold.rs. It is actually kind of complex, because it is dealing with various details pertaining to the global and local arenas I was talking about earlier; these mostly don't relate to us. We can build our "replace all regions with a fresh inference variable" function instead using the more general fold_regions helper.

Basically we still need to write a MIR visitor like erase-regions, let's call it renumber_regions, but it wants to invoke a method that does something like this:

self.infcx.tcx.fold_regions(&value, &mut false, |_region| {
    // Note that we do not care about the incoming `_region`.
    // This is the result from the old inference, and we just
    // want to ignore that completely!
    self.infcx.next_region_var(infer::MiscVariable(span))
})

Some notes on this snippet:

  • I am assuming that this code is being put inside of a visitor, which is what self refers to. This visitor will want to carry a reference to the inference context (hence self.infcx).
  • You may be wondering about that sort of odd &mut false parameter to fold_regions(). For some weird reason, fold_regions() takes a mutable boolean parameter to tell you if it encountered "bound regions". Without going into the details, we don't care about that here, so we can just pass &mut false to make a temporary boolean that fold_regions() can mutate and we never have to observe the result.
  • Finally, when we create a region variable, we have to give it an "origin". This is information meant to tell the region inferencer why the region variable was created, so that it can produce good error messages (ha! in theory, anyway). I'm just using the "miscellaneous variable" thing here, but it may be useful to thread down a span for where the type appeared, it we have ready access to one.

So once we have this region eraser, the idea would be to apply it to our cloned MIR so that we now have a copy of the MIR will all regions replaced with fresh variables. Each variable will have a unique index and eveything.

Generating constraints

At this point, we will want to start implementing the algorithm proper. The way I see it, we'll have a data structure (similar to the one in the NLL prototype) that is tracking constraints on these region variables. I would probably start by ignoring the type-related constraints (i.e., those from subtyping and borrows) and just focus on the ones derived from liveness. That means we need to do a liveness computation, first of all, and then add in "includes this point" constraints for the regions in every variable.

Assertion framework

Once we're done with that, we need some way to test the output! My vision here is not that concrete. We'll have to figure out exactly what it should look like. But the high-level idea is to make test cases that use special rustc_ attributes or special errors to encode assertions.

For example, to test the liveness, we might do something like this. Imagine that we make a special attribute #[rustc_liveness_unit_test] (this will be permanently feature-gated). This flag causes the compiler to emit "errors" indicating what is live or dead at various points. We might trigger such errors at every point where (e.g.) a unit value like () is created. Then we could write a compile-fail test like this:

#[rustc_liveness_unit_test]
fn foo() {
  let x = 22;
  
  (); //~ ERROR live: [x] dead: []

  println!("{}", x);

  (); //~ ERROR live: [] dead: [x]
}

We can dig more into this when we get there. =)

So, some of the separable bits of work here

Here are some things that could be individual PRs:

  • Creating a -Znll flag and inserting an (empty, for now) MIR pass. It occurs to me I didn't give any hints about how to add debugflags; it's fairly straightforward. You have to modify this macro and then you can access the flag through tcx.sess.opts.nll.
    • I would insert the MIR pass unconditionally, but have it just do nothing is this debugging flag is not set.
  • Writing the "region renumberer" that creates the inference context and rewrites all the regions with fresh variables.
  • Writing the "liveness" code that computes, for each MIR local variable/temporary etc, if it is live at a particular point.
    • I didn't go into this. Later.
    • Actually, we can probably build on this to replace the existing liveness code, I just realized, but let's leave that for now.
  • Adding the constraints due to liveness, and adding some form of infrastructure to test them.
  • Adding the constraints due to subtyping, hopefully re-using the previous infrastructure.
  • etc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment