Language design question: I wonder if we can replace or supplant the notion of "lifetimes" in languages like Rust with a notion of "parent objects" (or "context objects").
In Rust (and most languages), iterators into vectors are (ptr, end)
pairs. Having a notion of "parent" objects allows us to make them plain ptr
values, and additionally allows for safely appending to vectors while iterating.
Some background first. In Rust, iterators into vectors (technically, into slices) are internally represented as (ptr, end)
pairs. Thus if we create an iterator,
let mut i = vec.iter();
the compiler copies vec.end
into i.end
. This is a strange duplication of data -- in a hand-written loop in C, we would just use a single pointer as our iterator, and we wouldn't need to copy vec.end
.
But storing i.end
is technically necessary, because we could reassign
if cond { i = vec2.iter(); }
and now i
needs to remember which vector it came from -- we don't know at compile time if iteration ends at vec.end
or vec2.end
.
Of course, if we don't have any reassignment like this, the compiler might elide the i.end
pointer opportunistically. But annoyingly, this isn't guaranteed.
Rust typing logic seems to be roughly: if i
's lifetime is 'a
, we can assign an iterator from any object whose lifetime 'b
outlives 'a
. But in practice, when we pass iterators around we usually know precisely what vector they came from. So if we were to design a language from scratch, perhaps instead of saying "i
is an iterator with lifetime 'a
", we'd want to say "i
is an iterator derived from vec
[which had better outlive i
]". This would imply that we can reassign i
to be any iterator into vec
, but we can no longer reassign it to be an iterator into vec2
.
In other words, right now i
's type is Iter<'a, T: 'a>
, but perhaps we could make a language where i
's type contains a reference to a specific object, like Iter<vec: Vec>
(with T=vec::Item
). Then we can change the implementation for Iter::is_empty()
from
self.ptr == self.end // use .end field on instance
to
// statically get vec from our type, and pull .end from it
self.ptr == self::vec.end
Three things motivate this (beyond saving a few bytes for the iter.end
pointer):
-
Lifetime error messages are sometimes hard to read. I suspect that this is because lifetimes are not usually first-class objects in our minds. Saying "
i
is a child ofvec
" might be easier to understand than making an equivalent statement about lifetimes. -
I see an antipattern (happening in most languages) of data structures containing redundant parent references that need to be kept consistent. E.g. I might write
struct Blog { posts: Vec<Post> } struct Post { blog: &Blog // parent reference // ... fields ... }
just so I can pass around
&Post
references and still get at the containingBlog
. Ideally the compiler could handle this for us: When the parent blog is known at compile time, we don't need the parent reference. When it isn't known at compile time, we might (perhaps implicitly) represent&Post
references as not a single pointer, but as a pair of pointers toBlog
andPost
, thereby making the&blog
parent reference external to thePost
structure. -
The idea of "parent" and "child" objects also opens the door to allowing pushing into vectors while iterating.
To recap the problem, say you want to write:
for &item in vec { if (cond) { vec.push(...); } }
Recall that vectors are
(ptr, length, capacity)
tuples, and thatvec.push()
might reallocate the vector when there is no remaining capacity. When this happens, thefor
loop's iterator into thevec
becomes invalid. Rust will therefore reject the code above, while C++ will compile but segfault.But perhaps we can make it work!
In the code example above, an iterator can get its parent vector. Let's imagine that conversely, a vector might be able to get its children iterators. Then
push()
can update all outstanding iterators whenever it reallocates. So here's how we might definevec.push()
:#[inline] fn push(&mut self, value: T) { if out_of_capacity { // ... reallocate for more capacity ... for iter in &mut self::outstanding_iterators { // Update each iter to point into the new memory block! iter.ptr += (new_ptr - old_ptr) } } // ... append value... }
We might want to ensure that
self::outstanding_iterators
is known at compile time, so the loop can always be elided or unrolled.
So what are people's thoughts on this? Has this kind of "parent" thing been done? Is it doable with any existing language? It seems like it heavily overlaps with lifetime logic, so perhaps people who have designed lifetime-based languages like Rust have some thoughts about the tradeoffs here?
Discussion: https://twitter.com/jo_liss/status/844697741276184576