Recently I stumbled over the blog post The Inconceivable Types of Rust.
It stuck with me, the ideas were great, but felt unpolished, so I thought about it and I want to share those thoughts with you. In this post I aim to outline what these concepts are and their applications, what syntax I feel fits well, and broadly how one might design a type checker for this.
Let's start with partial types, these already kind of exist in Rust in, what Considerations on Codecrafting called it, a shadow type system. A partial types is the type of variable, that has been partially moved or initialized (or borrowed or variant). These are currently undescribable and only exist within functions, a way to write them could be:
struct Origin {
a: usize,
b: Option<String>,
c: (u8, f32),
}
type Partial = Origin{a,c{1}};
You just write, what parts are (partially) there. Writing nothing ({}) works for any type, including builtins like usize and & and generics and Self. (This syntax is taken from CoC with minor adjustments) These types of course have the same layout as their origins.
Before we move on to variant types, I need to explain the concept of weak coercion. Rust currently has only one type of coercion, which is always explicit, let's call it strong coercion. Weak coercion would be implicit, but can't fully change type like strong coercion (! -> T, &Vec<T> -> &[T]), it can only change to other partial, variant and borrowed types of the same origin.
Partial types can only coerce if the fields coerce, this is not the case for the previous examples, as you can't get a type from nothing (T{} -/> T), this will only work with variants and borrows. These will be explained in their sections.
Variant types are the enum equivalent to partial types. They give the possible variants it could be. These don't exist yet, but they would have some nice benefits:
- Auto traits and movability only care for the active parts of a type. Your type could have a start variant, that can move, but after a use (a poll) can't.
- If there is explicitly only one variant, you could allow access to its fields.
- You wouldn't need to check variants that explicitly aren't there. No more
match { A => {...}, B => unreachable!() }
.
These could be written:
enum Origin {
A,
B{a: usize, b: &'static str},
C{a: char, b: Vec<usize>},
}
type Partial = Origin::A | Origin::B{b};
You just write which variants it could be (even if they're partial). Each variant can only be listed once, types can't be partial and whole. These types again have the layout of their origin.
These can weakly coerce into any variant type that contains at least its variants (T::A -> T::A | T::B, T::B -> T). If there are partials, they can also coerce (T::A{} -/> T, T::A{a::B} -> T::A{a} | T::B).
Values can be borrowed, in which case they turn into their borrowed type, while producing a reference. Like their reference counterparts, there are two borrowed types, shared or mutable. These types also kind of exist in a shadow type system currently, we could write them:
type Origin = usize;
type Borrowed = !Origin;
type BorrowedMut = !'a mut Origin;
You write them similar to the references, but with ! instead of &. (This syntax is taken from CoC) They do not take external/generic lifetimes, but create their own unique lifetime (this eliminates the need for bound/portable lifetimes).
These can always weakly coerce into their origin, invalidating any reference using their lifetime (or a subref 'a: 'b) in the process (!'a T -> T && &'a T -> (&'a T){}). To make this work with the borrow checking rules, we just need to allow creating shared references from a shared borrowed type with its lifetime.
CoC introduced an owned reference &own in their post. I find this syntax too implicit: What is the type of the owned borrowed value afterwards.
I would rather introduce a more explicit type mutable reference. This reference would have a current and an end type. References are required to drop whole, meaning that current and end need to be the same type when they go out of scope or are invalidated. A normal mutable reference would just be a reference with the same start and end. Start and end are always of the same origin.
This enables us to:
- move out of mutable references.
- do remote initialization, avoiding buffer overflows, and movability problems.
- correctly annotate drop, making it safe to call.
- have Linear types (only with the different drop, implementing !Drop and having your own interface for emptying).
You could write it/use it:
enum Origin {
A,
B{a: usize, b: String},
};
type TypeMutRef<'a> = &'a mut Origin::A->Origin::B{a};
impl<T> Box<T{}> {
pub fn new() -> Self {..}
pub fn init<F: FnOnce(&mut T{}->T)>(self, f: F) -> Box<T> {..}
}
You just write the start going to the end.
I talked about a different drop and one that annotates it correctly, so which is it? Both.
The trait itself just has the &mut self
replaced with &mut self->Self{}
,
but it now would be implemented automatically as an auto trait and necessary to drop a non-empty type (not T{}).
It would be like this:
trait Drop {
fn drop(&mut self->Self{});
}
enum Normal {
A,
B{a: usize, b: Box<str>},
}
// Automatically generate for any type
impl Drop for Normal {
fn drop(&mut self->Self{}) {
match self {
A => {},
B{a,b} => {},
}
}
}
impl Drop for Normal::B {
fn drop(&mut self->Self{}) {
match self {
B{a,b} => {},
}
}
}
impl Drop for Normal::B{a} {
fn drop(&mut self->Self{}) {
match self {
B{a} => {}
}
}
}
// etc.
struct Manual {
p: *Normal,
}
// Manually implemented
impl Drop for Manual {
fn drop(&mut self->Self) {
let Manual{p} = self;
unsafe {dealloc(p, Layout::new::<Normal>())};
}
}
// No automatic impls for Partials/Variants if there is a manual impl on origin
struct Linear {
v: Manual,
}
// Negative implementaion
impl !Drop for Linear {}
// No automatic impl for Partials/Variants if there is a negative impl on origin
impl Linear {
fn use(self) {
// do something like log
let Linear{v} = self;
}
}
I have thought broadly about a how to make a function scope type checker for this:
- You would first create a packed mapping of the types' Variants, Partials and Borrows, so that you can do weak coercion checks with just some logic operations.
- Then you add generic types and lifetimes, the types are opaque and just need space for the empty/full bit and an optional reference to its lifetime if borrowed, as well as the type of borrow.
- Desugar (and extend lifetimes).
- After that, you break the function into parts a la Polonius, encoding the program flow in a (flat) graph.
- Now is the time for type inference.
- When all the types are inferred, we go quickly through all blocks and connections, taking note of any type changes (not checking them just yet). That includes moves, borrows, match/if-let statements (they turn it into the variant), assignments, etc. (possibly parallel)
- We resolve which types the parts have at their start by going through the network, visiting each part only once, and applying type changes in sequence, having multiple types per variable if needed. (Here could be lifetime extension as well, drop insertion, etc.)
- At last we actually check things (possibly in parallel). For every part, we try to coerce the values into one per variable, then step through and try to apply any coercions necessary.
- Finally, if any errors were found, we generate the appropriate messages, else we can compile!
As I am not an expert on language design, there are likely to be errors somewhere in my logic, but at least it's something, with some optimizations.
Considerations on Codecrafting had quite the good idea, there were some snags like bound lifetimes, but they are surprisingly possible to implement. I don't expect this to actually get into the language (again, I am just some student with an interest in Rust), but maybe this informs/inspires someone to improve the parts of the type system, that caused these thoughts.
There are some more things that didn't fit in well, like how it makes sense to make the never type coerce weakly into anything (unsoundness doesn't matter, as this will never be reached).
There needs to be an exception to moving, if it reassigns to itself (changes variants, but not fields; maybe only for repr("C")), to make Futures truly writable without unsafe.
While trying out my typechecker (by hand), I noticed, that adding multiple lifetimes makes things easier. A reference with multiple lifetimes lives as long as the shortest. Written &'1'2 T.
Another thing to consider would be adding partial and variant primitives Have a usize/pointer in a given range, have arrays initialized in a given range, etc.
A Forget trait would be great to properly define the behaviour of type mutable references within the typesystem, as well as for linear types.
There are complications with the type mutable reference and panics (what doesn't have some with it?). In my opinion the best option would be to on unwind places first try dropping it, then try forgetting it, and otherwise abort or maybe not compile.
Thanks for the write-up! I like the ideas. You might be interested in https://ia0.github.io/unsafe-mental-model/ (regarding mutable references in particular) and possibly my thesis about unifying type systems http://phd.ia0.fr/. I also started a discussion recently about strong updates which would work well with partial, variant, and borrowed types https://internals.rust-lang.org/t/strong-updates-status/21074.