I have to insist on using explicit lifetimes for spec'ing. Both because I think it makes the rules a lot clearer, but also because it separates the safety rules from the language semantics. Let me introduce the description of lifetimes in a series of rules:
- All expressions with a storage location (things you can take a
ref
to) have an implicit, language-defined lifetime. These lifetimes tend to be pretty simple. If you say
void M() {
int x;
}
The lifetime of int x
is the same as the lifetime of method M
. This is true for all structs. For fields of types, if the type is a class then the lifetime is the "global" (heap) lifetime. If it's a field of a struct, it's the lifetime of the containing struct. You can think of lifetimes as being part of the type, so while it looks like the type of x
is int
, it's really (x, $M)
, where $M
is the lifetime of the method M
.
-
The previous definition says that variables have the lifetime of the containing method. Methods also have lifetimes. The lifetime of a method is a unique lifetime that is smaller than the lifetime of its caller.
Main
has a special, non-global lifetime that we won't name because it's not important. -
(2) states that methods have a unique lifetime that's smaller than its caller. But methods can have multiple callers! Once again, let's think about lifetimes as types. If we have a method which has different types, depending on how it's called, we already have a mechanism for that in the language. Generics! In fact, (2) is slightly wrong. Methods do not have unique lifetimes that are smaller than the callers -- method instantiations have a unique lifetime that's smaller than the caller, which implicitly passes its lifetime as part of the instatiation of the callees. Fortunately, we don't need to give names for any of these lifetimes as they are language-defined and cannot be changed, and aren't (currently) useful to refer to.
The above basically sums up the language without ref
s. There's not much to say, because lifetimes are never in conflict without refs, because structs copy, meaning the lifetimes don't have to match, and there's only one heap lifetime that's always the same.
Now, let's introduce refs. More rules.
-
Refs don't fall into the previous definitions. First, they have two lifetimes: the lifetime of the ref variable itself, which basically behaves like a regular variable, and the lifetime of the referent. We'll mostly refer here to lifetime of the referent, because we don't need any new rules for the variable itself, it looks like a struct variable.
-
Ref variables also don't have a unique, language-defined lifetime. They take the lifetime from their initializer. If you say
void M() {
int x = 0;
ref int r = ref x;
ref int r2 = ref (new int[] { 0 })[0];
}
Then the lifetime of r
is the lifetime of x
, which is the lifetime of M
. The lifetime of r2
, however, is the lifetime of the first array element -- which is located on the heap. So the lifetime of r2
is the global lifetime. This matters for the next rule.
-
Ref variables must not have a longer lifetime than the storage they point to. This matters once we have ref-reassignment, as we could re-assign a ref to a location with a shorter lifetime than the ref itself.
-
It's now useful to have a language to talk about lifetimes explicitly. Let's use generic notation, since lifetimes are basically types. We would write the previous example as:
void M() {
int x = 0;
ref<$M> int r = ref x;
ref<$global> int r2 = ref (new int[] { 0})[0];
}
We can relate $M
and $global
by saying that $global
is a longer lifetime than all other lifetimes, so it can be assigned to all other lifetimes. This corresponds to type variance. Longer lifetimes are basically subtypes of shorter lifetimes.
- What about ref parameters? They're interesting because they have lifetimes that come from the outside, i.e. they depend on the caller. This looks just like the method lifetime problem we discussed before. But now we need to have explicit parameterization for our new syntax.
void M<$a, $b>(ref<$a> int x, ref<$b> int y) { ... }
The lifetimes go at the beginning of the method parameter list, prefixed with $
.
- Since users don't write these lifetimes, they're implied for normal C# programs. The rule in C# is that, for all ref parameters, one lifetime is created for all the parameters and return types. So
ref int M(ref int x, ref int y) { ... }
would actually be
ref<$a> int M<$a>(ref<$a> int x, ref<$a> int y) { ... }
- Lifetime safety follows the normal generic rules. So if the program would type check, it's safe.
OK, we're now completely caught up to C#6 (before ref structs).
There's one very important thing we can see already: the set of possible signatures that we can express in C# is far less than the set of legal, safe signatures that would be possible with explicit annotations. Without the scoped
keyword there's really only one signature we can write for any method with ref parameters and returns. If we add scoped
then we effectively force the scoped
ref parameter to have a different lifetime than the return. So
ref int M(scoped ref int x, ref int y) { ... }
translates to
ref<$a> int M<$a, $b>(ref<$b> int x, ref<$a> int y) { ...}
For a method with N
ref parameters and a ref return type, we have at most 2^N
possible methods. Even restricting ourselves to one lifetime variable per ref parameter and return value, there are (N+1)^(N+1)
possible signatures we could represent with explicit lifetimes. So we have to ensure that the combinations we want to allow can be expressed in our notation, as there's no possibility that we could express all safe options in just the scoped
notation.
Next, let's add ref structs.
Ref structs have lifetime variables, just like ref parameters. Unlike ref parameters, we put the ref variables in the type definition, with the other generic parameters, e.g.
ref struct RS<$a> {
ref<$a> int Item;
}
I believe today the defaults for ref structs are the same as the defaults for ref variables. So
RS M(RS rx, RS ry) { ... }
translates to
RS<$a> M<$a>(RS<$a> rx, RS<$a> ry) { ... }
I think this holds for combinations as well, i.e.
ref int M(Span<int> rs, ref int x) { ... }
is
ref<$a> int M<$a>(Span<$a, int> rs, ref<$a> int x) { ... }
For ref-to-ref structs, presumably we could use the same rule. The question is whether this is flexible enough for all scenarios. If we have a list of things we think should be allowed, they could probably be type-checked with this rule.
Looks good to me, only a few notes:
We actually already have this constraint in C#, it's
:
, i.e.One subtlety is that bigger lifetimes are actually the subtype, though, so if you wanted to say
you would actually write
This looks right, but technically I don't think you need a new rule, this is just how generics work and if you model lifetimes as generics, this falls out naturally. For example,
And yeah, ref readonly could probably be safe, but we don't allow that in C# (because we didn't add support when we added ref readonly) so I don't think we should allow it for lifetimes either until we add it for both.
Those scope names are fine for adding clarity, but I just want to note that they're unnecessary. The variables are generic so there's no particular restriction on what lifetime goes in there (calling method, calling method of the calling method, etc) as long as the constraint
where $r < $c
is satisfied.Yup. But this is to be expected when we chose to add only one "scoped" modifier to the language. The complexity of lifetimes we can represent is really, really small compared to the space of safe lifetimes. I doubt there's a particularly clever thing we could design here as each method and struct needs to be type-checked separately and therefore can't use any information beyond its own signature to decide the lifetime variable assignment. We just don't have many bits of information to work with.
If we eventually end up with too many scenarios to express, I think we need to bring this back to LDM for recommendations. If we want more functionality we'll need more bits of information.