The plan is to develop non-lexical lifetimes in parallel with borrowck.
The vision for how NLL integrates into the compiler pipeline is that
we will move all region processing to occur after MIR
construction. This means that the existing regionck code will go away
completely (or almost completely). Instead, when we do the first round
of type-checking, we will just use "erased" regions ('erased
), just
as we do in trans. But we don't have to worry about this yet -- we can
prototype NLL while still keeping the existing region code in place
(in fact, I think we must, so that the compiler keeps working!). What
it does mean is that we can start building an NLL inference pass
that basically completely ignores the existing regions, except for
those in the types of arguments (later we will want to change this:
specifically, if the user hand-wrote regions into types of local
variables or as
casts and so forth, we do need to respect that, but
we can ignore this detail to start).
I envision adding a compiler flag -Znll
which will enable
"non-lexical lifetimes inference". This flag will cause us to insert a
MIR pass, probably at the
end of the MIR_VALIDATED
suite. This new MIR pass
-- let's call it nll
for now, though I guess eventually we can just
call it regionck
-- will be performing our region inference, and
also processing various correct assertions we can use for unit testing.
You've probably noticed by now that the compiler uses the term region instead of lifetime most of the time. Region is the traditional name in the academic literature. To me, the two are basically interchangeable at this point -- but I'll keep using region since it seems to come more easily when I'm talking about the basic implementation details.
The pass will probably begin by making a clone of the MIR so it can
mutate it without disturbing the rest of the compiler. We can then run
a variant of the
existing erase_regions
pass. erase_regions
is a
MIR visitor that walks over a MIR and (in place) rewrites all the
types to remove references to regions and replace them with
'erased
. I think what we want to do is to remove all regions and
replace them with ReVar
instances. To do that, we'll
probably have to abuse the existing type and region inference
machinery. Let me diverge a bit to explain how that works.
The compiler allocates types that contain "type inference byproducts"
from other types in a separate, temporary arena. This allows us to
free all those types once type inference is done. Since we plan to use
ReVar
regions, we need to setup one of those temporary arenas. Right
now, the only way to do that is to create an inference context -- this
will in turn create (as a part of that) a region inference context. Of
course, we are now in the process of prototyping the replacement
for the existing region inference, so it's not entirely desirable to
also have the existing region inference floating around -- but
actually it doesn't do us any real harm, I don't think.
To create an inference context, you do this:
tcx.infer_ctxt().enter(|infcx| {
// In here, you have an inference context `infcx`.
// Once `enter()` returns, the `infcx` will be freed, which
// means that all the temporary inference types will be freed
// too.
})
You would basically do this as the very first thing in your inference pass, and the entirety of your inference pass goes inside of that closure. Then when you are done, the closure will exit, and all the temporary memory we've allocated will get freed.
Many types in the compiler have three lifetimes. These are precisely to differentiate between three kinds of data:
- global data (
'gcx
, but -- unfortunately -- sometimes referred to as'tcx
, for legacy reasons). This is data that persists for basically the entire compilation (at least up until LLVM kicks in). Examples of such data are the HIR but also global types likei32
. - inference data (typically
'tcx
). This is data that persists just for the inference session (i.e., during that closure we created above). - short-lived data (typically just
'a
). This is used to represent the lifetime of random references that are stored in the struct and only have to live as long as the struct is in use (they may of course live longer, but the struct doesn't care). Often'a
is the intersection of a lot of random lifetimes.
When we are in the inference context, if we access the tcx
(type
context) through infcx.tcx
, we will get back a type with three
lifetime parameters: TyCtxt<'a, 'gcx, 'tcx>
. This type is in fact
just a couple of references, and looking at its definition may help
you to see the three roles more clearly:
pub struct TyCtxt<'a, 'gcx: 'a+'tcx, 'tcx: 'a> {
gcx: &'a GlobalCtxt<'gcx>,
interners: &'a CtxtInterners<'tcx>
}
First off, the struct fields are both references, and their lifetimes
are both 'a
. In this sense 'a
doesn't represent the lifetime of
any of the data stored in the type context, it just represents the
lifetime of our reference to the tcx.
Next, the gcx
field refers to a struct (the GlobalCtxt
) containing the
global data for the entire compilation. You can see it is parameters with 'gcx
.
Finally, the interners
refers to the interning tables that are specific
to this inference session, and you see that it is parameterized by 'tcx
.
OK, so, once we've setup our inference context, we can walk over the
regions that appear in the MIR and replace them with fresh
variables. As I said, this is similar to the existing
erase_regions
pass, but
instead of invoking erase_regions
, we would want to write a
similar function that replaces regions with fresh variables.
The erase_regions
helper is defined in ty/fold.rs
. It is
actually kind of complex, because it is dealing with various details
pertaining to the global and local arenas I was talking about earlier;
these mostly don't relate to us. We can build our "replace all regions with
a fresh inference variable" function instead using the more general fold_regions
helper.
Basically we still need to write a MIR visitor like erase-regions,
let's call it renumber_regions
, but it wants to invoke a method that
does something like this:
self.infcx.tcx.fold_regions(&value, &mut false, |_region| {
// Note that we do not care about the incoming `_region`.
// This is the result from the old inference, and we just
// want to ignore that completely!
self.infcx.next_region_var(infer::MiscVariable(span))
})
Some notes on this snippet:
- I am assuming that this code is being put inside of a visitor, which is
what
self
refers to. This visitor will want to carry a reference to the inference context (henceself.infcx
). - You may be wondering about that sort of odd
&mut false
parameter tofold_regions()
. For some weird reason,fold_regions()
takes a mutable boolean parameter to tell you if it encountered "bound regions". Without going into the details, we don't care about that here, so we can just pass&mut false
to make a temporary boolean thatfold_regions()
can mutate and we never have to observe the result. - Finally, when we create a region variable, we have to give it an "origin". This is information meant to tell the region inferencer why the region variable was created, so that it can produce good error messages (ha! in theory, anyway). I'm just using the "miscellaneous variable" thing here, but it may be useful to thread down a span for where the type appeared, it we have ready access to one.
So once we have this region eraser, the idea would be to apply it to our cloned MIR so that we now have a copy of the MIR will all regions replaced with fresh variables. Each variable will have a unique index and eveything.
At this point, we will want to start implementing the algorithm proper. The way I see it, we'll have a data structure (similar to the one in the NLL prototype) that is tracking constraints on these region variables. I would probably start by ignoring the type-related constraints (i.e., those from subtyping and borrows) and just focus on the ones derived from liveness. That means we need to do a liveness computation, first of all, and then add in "includes this point" constraints for the regions in every variable.
Once we're done with that, we need some way to test the output! My vision
here is not that concrete. We'll have to figure out exactly what it should look
like. But the high-level idea is to make test cases that use special rustc_
attributes or special errors to encode assertions.
For example, to test the liveness, we might do something like
this. Imagine that we make a special attribute
#[rustc_liveness_unit_test]
(this will be permanently
feature-gated). This flag causes the compiler to emit "errors"
indicating what is live or dead at various points. We might trigger
such errors at every point where (e.g.) a unit value like ()
is
created. Then we could write a compile-fail
test like
this:
#[rustc_liveness_unit_test]
fn foo() {
let x = 22;
(); //~ ERROR live: [x] dead: []
println!("{}", x);
(); //~ ERROR live: [] dead: [x]
}
We can dig more into this when we get there. =)
Here are some things that could be individual PRs:
- Creating a
-Znll
flag and inserting an (empty, for now) MIR pass. It occurs to me I didn't give any hints about how to add debugflags; it's fairly straightforward. You have to modify this macro and then you can access the flag throughtcx.sess.opts.nll
.- I would insert the MIR pass unconditionally, but have it just do nothing is this debugging flag is not set.
- Writing the "region renumberer" that creates the inference context and rewrites all the regions with fresh variables.
- Writing the "liveness" code that computes, for each MIR local variable/temporary etc,
if it is live at a particular point.
- I didn't go into this. Later.
- Actually, we can probably build on this to replace the existing liveness code, I just realized, but let's leave that for now.
- Adding the constraints due to liveness, and adding some form of infrastructure to test them.
- Adding the constraints due to subtyping, hopefully re-using the previous infrastructure.
- etc.