Skip to content

Instantly share code, notes, and snippets.

@nmsmith
Last active March 12, 2025 01:45
Show Gist options
  • Save nmsmith/cdaa94aa74e8e0611221e65db8e41f7b to your computer and use it in GitHub Desktop.
Save nmsmith/cdaa94aa74e8e0611221e65db8e41f7b to your computer and use it in GitHub Desktop.

Update March 2025: I am working on a newer version of this proposal, that aims to simplify the model substantially. It's still based on "regions", but I am no longer tying a region to a specific path in the ownership tree. When I have finished that proposal, I will put a link here.

Abstract

In this document, I present a novel model for memory-safe references that aims to significantly improve upon the Rust-inspired model that Mojo currently uses.

The major advancement is to eliminate the "aliasing XOR mutability" restriction amongst references, and replace it with a similar restriction applied to lifetime parameters. This model is significantly more expressive, allowing many of the memory-safe programs that Rust rejects to compile successfully, without resorting to escape hatches.

This document is meant to be a high-level overview, not a specification. Some details are omitted for the sake of brevity, others are omitted because the design is not fully finished. The design has been under development for several months, and although it is not finished, I am now confident that it is going to work.

My goal is to make Mojo references simpler, without sacrificing performance.

Note: I am not a Modular employee. Ultimately, Modular will decide whether this design should be adopted.

Intended audience

This document is intended for the Mojo community. I assume moderate familiarity with Mojo, and basic familiarity with Rust.

Memory-safe pointers don't need to be an "advanced" feature

Mojo aims to bridge the divide between "scripting" and "systems programming". For scripting, Mojo adopts the syntax and high-level abstractions of Python, and for systems programming, Mojo offers powerful features such as compile-time metaprogramming, and deep support for SIMD and GPUs.

Scripting languages and systems languages begin to butt heads when it comes to memory management. Scripting languages aim to free the programmer from needing to think about memory, including memory safety. They typically achieve this by offering a garbage collector. Conversely, systems languages aim to utilize hardware resources effectively, and garbage collection undermines this goal.

Rust has demonstrated that it is possible for a language to offer memory safety without garbage collection, however Rust is notorious for being difficult to master, and for heavily restricting the kinds of programs that a programmer can write. This philosophy is at odds with scripting languages like Python.

This proposal is an attempt to demonstrate that memory safety without GC can be simple enough that it feels closer to scripting than systems programming. We should be able to teach it to beginners, not just to experts. Ideally, when a Mojo user needs to construct a "reference" to something, they will reach for a zero-overhead reference, rather than a garbage-collected one.

I'm confident we can achieve this, without any major trade-offs.

What is a "lifetime" supposed to model?

To start, we need to reconsider what a "lifetime" is.

Lifetimes were first introduced by Rust, and Mojo has followed suit—although they have been recently renamed to "origins".

The best way to understand these concepts is to forget about their names, and think about what purpose they serve.

When we associate a reference or pointer with a "lifetime" parameter, we are attempting to model (in the type system) the set of variables that a pointer can target.

That's it.

In my opinion, any other explanation is wildly off-track.

By "variable", I mean a memory location storing an instance of a struct. The variable could be a local variable, a field, or an element of a list.

This is a very simple model. Every programmer knows what a variable is. The idea that a pointer can only point to a variable drawn from a set of alternatives is not complicated.

So where does the complexity begin?

Well, first of all, neither Rust nor Mojo actually model a lifetime as a set of variables. Instead, they model it as an untyped memory region that holds the pointer's target.

What's stopping us from modelling a set of variables directly? To be honest, I have no idea—I have to assume that it's a historical accident. Rust's model was the outcome of years of research and experimentation. Experimentation is often slow and messy, and the main goal was just to develop something that worked. Amongst all of that effort, perhaps the idea that a "lifetime" should model a set of variables just never arose. That said, the Rust designers have seemingly begun moving toward this model in recent years.

Let me give a concrete example. Suppose you have people: List[Person], and you want to select certain people from the list, producing a List[Pointer[???]]. How might I describe the set of valid targets?

Here are two syntaxes that I believe are relatively "obvious":

  • List[Pointer[people.items]]
  • List[Pointer[{people*}]]

In the first syntax, people.items is meant to denote the "set" of all items of the list. This is not a runtime data structure; it's a compile-time value that serves the same role as a "lifetime". However, unlike a Rust lifetime, it communicates the type of the variables being targeted. That's why Pointer only has one parameter.

  • Note: For now, I am just providing a high-level overview. We will dig into the details later on.

In the second syntax, {people*} makes it clear that we are parameterizing the pointer type by a conceptual set. The idea behind the {}-based syntax and the postfix * operator (which is meant to remind people of Python's "unpacking" operator) is that it gives us a way to map over sets and merge them together:

  • List[Pointer[{people*.name, animals*.name, default_name}]]

This type is meant to denote a pointer that points to either:

  • The name field of any Person stored in the list named people.
  • The name field of any Animal stored in the collection named animals.
  • The variable default_name.

Later on, I will explain the semantics behind this syntax. For the moment, let's just focus on the merits of the syntax itself.

The set of targets associated with the pointer type can be visualized as follows: target region

To simplify the visualization, I've assumed each list contains three items. The variables that match the expression {people*.name, animals*.name, default_name} have been highlighted in gold. Each pointer parameterized by this set can only target one of these 7 strings.

We need a term to describe this set of targets. Mojo currently uses the term "origin", but in my opinion that carries the wrong meaning. An origin is a singular location where something comes from, but what we've been discussing is a set of locations that a pointer can point to, or equivalently: the set of values that a pointer can take on. Notably, a memory location doesn't need to be accessed in order to construct a pointer to it, so it's not accurate to say that a pointer "came from" the location that it points to.

There aren't many viable terms. After having considered a lot of alternatives over the last 9 months, I've narrowed it down to three contenders:

  1. Region. A region is a set of variables, i.e. a set of memory locations. Every pointer is constrained to point within a specific region.
  2. Domain. A domain is the set of values (addresses) that the pointer can take on. Every pointer is constrained to take on a value in the given domain.
  3. Group. A group is a set of variables. Every pointer is constrained to point to a member of a group.

Later on, we'll talk about overlapping regions/domains/groups, and disjoint regions/domains/groups. These adjectives and nouns work well together, whereas it's less clear to me what the term "overlapping origin" is supposed to mean.

In this proposal I use the term "region", although upon reflection, I think it creates come confusion between the concepts of "overlapping memory" and "overlapping sets". For this reason, I'll probably switch to the term "group" in future drafts.

Where do mutability and aliasing enter the picture?

Parameterizing pointers by regions allows both the programmer and the compiler to reason about the "non-local effects" of operating on a variable. For example:

x = String(...)
y = String(...)

p: Pointer[{x, y}] = Pointer(...)

foo(x^)          # We might want/need to deinitialize x
print(p[])       # This operation is NOT safe

x = String(...)  # We can reinitialize x later on
print(p[])       # This operation is now safe again

By modelling the fact that the pointer p points to either x or y, we can prevent p from being dereferenced at moments where those variables are potentially uninitialized. If the variables are reinitialized afterwards, it becomes safe to use p again.

Readers familiar with Rust might be wondering: how do we declare the mutability of p? And why aren't the final two statements rejected as a borrowing error? Aren't we violating the "aliasing XOR mutability" rule?

Well... why should the last two statements be rejected? If we exclude the unsafe print statement, this program is perfectly valid, and there's no reason to reject it. The ideal is to reject only programs that are susceptible to memory errors. We want as few false positives as possible, and "aliasing XOR mutability" generates a LOT of false positives.

If we take a few steps back: our goal is not to enforce "aliasing XOR mutability". Our goal is for the programmer and the compiler to know which variables/references/pointers can alias each other, and to ensure that every variable is in a valid state before it is accessed. The above program (sans the invalid print) meets those requirements.

In fact, there exist many reasonable programs—including programs running on millions of computers right now—that construct a large number of aliasing pointers and use them to perform mutation. Here's one such program:

x = String(...)
y = String(...)

pointers: List[Pointer[{x, y}]] = List(Pointer(y), Pointer(x), Pointer(x))

foo(x^)
x = String(...)

for p in pointers:
    p[] = "empty"

There's clearly no "aliasing XOR mutability" here, and we don't need it, because this program is perfectly well-behaved. Ideally we can just let the programmer write this program, and avoid chastising them for breaking a rule that is ultimately irrelevant. To achieve this, we should focus on identifying the root cause of pointer invalidation. Notably, the root cause is NOT "aliasing and mutability".

With that in mind, let's discuss how to handle dangling pointers. This is best conceptualized in terms of dynamic containers. A dynamic container is a container that stores a dynamically-changing number of items, and/or items whose type changes dynamically. The two archetypal dynamic containers are:

  • resizable arrays (i.e. Mojo's List type)
  • tagged unions (i.e. Mojo's Variant type)

Pointers to the items of a dynamic container need to be treated carefully, because if the container is mutated, those items may no longer reside at their original locations. There are several reasons why the items might have gone missing:

  • The items were deleted. (e.g. a List was cleared.)
  • The items were moved somewhere else. (e.g. a List's buffer was reallocated.)
  • The item has changed type. (e.g. a Variant was reassigned to a different payload type.)

In all cases, the right action to take is to invalidate the pointer, i.e. prevent its further use. Here's an example:

names: List[String] = List("a", "b", "c")
selected_names: List[Pointer[{names*}]] = List(Pointer(names[0]), Pointer(...))

for name in selected_names:
    print(name[])   # OK

names.pop()

for name in selected_names:
    print(name[])   # Error: cannot dereference invalid pointer

In the above program, the names* region denotes the set of all items in the container. (We'll discuss how this works later on.) The main thing to notice is that after the operation names.pop(), the pointers targeting the names* region are invalidated. This invalidation is triggered as a result of names being passed as inout (or mut) to the function String.pop.

In short, we've prevented dangling pointers by modelling the concept of a dynamic container item. This is very different from how Rust prevents dangling pointers: we haven't imposed any restrictions on aliasing, and we don't have any "unique references". Instead, we're just modelling where a pointer points to, and we're ensuring that if a pointer's target has potentially become invalid, the pointer can no longer be dereferenced.

The next section covers another major feature of this proposal. After that, we'll discuss the fine details of how dynamic containers work.

Aliasing guarantees for function arguments

If Mojo didn't have the concept of a "function", the model that I've presented thus far would be sufficient. More precisely: if a program is just a big set of named variables, and you can describe which pointers point to which variables, then you can guarantee that pointers are used safely.

However, the moment you start factoring a program into functions, new challenges arise. For example, a function might receive a pointer to a variable whose name is not in scope within the function body. We need a way to describe that pointer's region.

Furthermore, in a language where arguments are passed "by reference"—which includes Mojo—we need to provide guarantees about which arguments can alias each other. We don't need to restrict aliasing per se, we just need the programmer and the compiler to know whether two arguments can alias. Or more precisely: we need the programmer and the compiler to know whether a mutation made through one argument can be observed through another argument. By default, we shouldn't allow mutations to be observed through other arguments, because this makes it easier to reason about a function's behaviour, and about data race freedom. It also enables functions to be compiled more efficiently. But there's no reason to deny a programmer the ability to opt into mutable aliasing. As we saw earlier, it's possible to offer mutable aliasing without risk of memory errors.

The question we need to answer is: how do we model aliasing amongst function arguments, and amongst pointers appearing in function types? Rust's concept of a "unique mutable reference" is definitely a non-starter, since by definition, it doesn't allow for programmers to opt into mutable aliasing. (Unless they start wrapping their variables in Cell and RefCell etc, but these types have major drawbacks.)

To achieve our goals, we'll need to reason about the disjointness of region parameters. In particular, we need to enforce the following restriction:

  • If a region is mutated, then it must be disjoint from other region parameters.

This should sound familiar: it's essentially "aliasing XOR mutability", but applied to regions rather than pointers. Let's look at an example:

fn set_names
    [name_rgn: Region[String], value_rgn: Region[String]]
    (names: List[Pointer[name_rgn]], value: Pointer[value_rgn]) mut name_rgn:
        for name in names:
            name[] = value[]

To simplify this example, I've modelled each argument in terms of explicit pointers. The function accepts two region parameters, name_rgn and value_rgn, which describe the variables targeted by the pointers. The variables in name_rgn are being mutated, which is why we've written mut name_rgn in the signature. (The exact syntax is not important here—what's important is that we've declared that the function can mutate the region's variables.)

The function overwrites all of the strings in the names list, by copying the string that value points to. This means that name_rgn is mutated, and value_rgn is read.

What would happen if the two regions overlapped—e.g. if one contained the variables {x, y} and the other contained {y, z}? That would enable names to hold a pointer to y, and for value to also point to y. The statement name[] = value[] requires the left-hand side to be uninitialized prior to performing the copy, which means it would invoke y's destructor. This would leave value[] pointing to invalid memory, and the copy would read from that invalid memory. Clearly, the compiler must somehow prevent this from happening.

There are a few ways we could achieve this. Let's consider two alternatives:

  1. The compiler could allow region parameters to freely overlap, which would result in the statement name[] = value[] being rejected with the error "cannot copy value[]; it might be uninitialized".
  2. The compiler could require that region parameters that are mutated are disjoint from other region parameters.

The second alternative allows for a much larger number of functions to compile successfully. Furthermore, it makes it easier for function authors to reason about how a function will behave: only pointers parameterized by the same region parameter can "cross-communicate". Given all of this, alternative #2 is clearly the right choice.

Notice how this model differs from Rust: pointers parameterized by the same region are allowed to mutably alias. For example:

fn foo[r: Region[String]](names: List[Pointer[r]]) mut r:
    p1 = names[0]
    p2 = names[1]
    p1[] = p2[]     # Error: cannot copy p2[]; it might be uninitialized.

As before, we're trying to copy the value of one string into another string. But this time, the two pointers are parameterized by the same region, so they might alias one another. The compiler knows this, and therefore can reject the program.

There are a lot of valid use-cases for allowing arguments to mutably alias. For example, it allows us to define a swap function that can swap two elements of the same collection, and the targets of two potentially-aliasing pointers:

fn swap[T: Movable, r: Region[T]](ref[r] x: T, ref[r] y: T) mut r:
    if __address_of(x) == __address_of(y):
        return
        
    # Now that we know the pointers don't alias, we can use unsafe
    # operations to swap the targets. The exact code isn't important.
    unsafe_x = UnsafePointer.address_of(x)
    unsafe_y = UnsafePointer.address_of(y)
    
    ...swap x and y by MOVING their values...

# Now we can swap list elements!
names = List[String](...)
swap(names[i], names[j])

# And pointer targets:
a: Pointer[{x, y}] = ...
b: Pointer[{x, y}] = ...
swap(a[], b[])

In Rust, it's not possible to define such a swap function, because it is incompatible with the "aliasing XOR mutability" rule. As a workaround, some collection types offer methods for swapping elements by index. This isn't a general solution, so it would be good if we can avoid going down that path.

In conclusion: "aliasing and mutability" is our friend, not our enemy. By embracing it, rather than trying to ban it, we can compile dramatically more programs, and impose less burden on programmers. For a language such as Mojo, which aims to be as approachable as Python, I strongly believe this is the right path forward.

Dynamic containers

As mentioned earlier, dynamic containers are the core concept that we should be focusing on. Many of the memory safety issues that Rust prevents, such as "iterator invalidation", boil down to the issue of mutating a dynamic container while holding a pointer to one of its items.

I defined a "dynamic container" as a container that stores a dynamically changing set of items. In general, the quantity, the type, and the location of items could all change, and any such change could result in a pointer becoming unsafe to dereference.

We need to model dynamic containers in Mojo's type system, so that the compiler can reason about them. As is usual in Mojo, we can boil everything down to an MLIR type. For this discussion, I'll focus specifically on an MLIR type that models a buffer. Containers such as List and Dict might be defined in terms of this buffer type.

Here are some key operations that the buffer type needs to support:

  • allocation
  • deallocation
  • item addition
  • item removal

All of these operations change the shape of the buffer. This makes sense, because a buffer's sole purpose is to hold items. Also note: a buffer cannot be moved. If you need to move a buffer, you must allocate a new one, and deallocate the old one.

Pointers to buffer items only need to be invalidated when the buffer is deallocated or an item is removed. Notably, pointers can be retained across item addition. That said, in order to keep things simple, I'll present a model wherein all buffer operations invalidate pointers.

Let's explore what it might look like to use this buffer. Below is a definition of List, containing some hypothetical MLIR operations. I'm not an MLIR expert, but thankfully the fine details of these operations are not important. What's important is that each of these operations requires self.buffer (and thus self) to be mutable. As I'll explain shortly, we can use mutation as a signal that pointers to buffer items must be invalidated.

Here's the definition of List:

struct List[T: CollectionElement]:
    var buffer: ...some mlir type...
    var length: Int
    var capacity: Int

    fn __init__(out self):
        self.buffer = __mlir_op.`lit.buffer.alloc`[type=T](0)
        self.length = 0
        self.capacity = 0

    fn __del__(owned self):
        # Delete each item
        for i in range(self.length):
            # This operation produces an owned value, which we discard.
            # (This trick is also used by UnsafePointer.destroy_pointee.)
            _ = __mlir_op.`lit.buffer.removeitem`(self.buffer, i)

        # Delete the buffer
        __mlir_op.`lit.buffer.dealloc`(self.buffer^)

    fn append(mut self, owned value: T):
        if self.length >= self.capacity:
            # Create a new buffer that is twice the size
            new_capacity = max(1, self.capacity * 2)
            new_buffer = __mlir_op.`lit.buffer.alloc`[type=T](new_capacity)

            # Switch to using the new buffer
            move_items(new_buffer, self.buffer) # definition omitted for brevity
            __mlir_op.`lit.buffer.dealloc`(self.buffer^)
            self.buffer = new_buffer
            self.capacity = new_capacity

        # Insert the new item. (The MLIR op returns an uninitialized l-value.)
        __mlir_op.`lit.buffer.newitem`(self.buffer, self.length) = value^
        self.length += 1

    fn pop(mut self) raises -> T:
        if self.length == 0:
            raise "nothing to pop"

        self.length -= 1
        return __mlir_op.`lit.buffer.removeitem`(self.buffer, self.length)^

This example is simplified in a bunch of ways. Please excuse that. The important thing is that we've captured the full lifecycle of List, including adding and removing items.

Each of the above methods requires self.buffer to be mutable:

  • __init__ initializes self.buffer
  • __del__ deinitializes self.buffer
  • append deinitializes and reinitializes self.buffer
  • pop requires self.buffer to be mutable in order to invoke removeitem

We're using the mutation of self.buffer as a signal that pointers to the buffer's items should be invalidated. More precisely: every operation that changes the shape of the buffer requires it to be mutable, and every operation that requires the buffer to be mutable triggers the invalidation of item pointers. In other words, we're preventing dangling pointers from the bottom-up. The concept of invalidation is built into the MLIR type.

To complete this design, we need to model the concept of a "buffer item" in the type system. Recall that a pointer is parameterized by a region, and a region denotes a set of items. Obviously then, we need a region for the buffer's items.

This region can be built directly into Mojo. Here's what it might look like:

struct List[T: CollectionElement]:
    var buffer: ...some mlir type...
    var length: Int
    var capacity: Int

    items: Region[T] = __item_region_of(buffer)

    fn __getitem__(self, i: Int) -> ref[self.items]:
        return __mlir_op.`lit.buffer.getitem`(self.buffer, i)

Again, the syntax isn't important. What's important is that we're making a region available as an attribute of a List instance. The statement that binds items should be understood as binding an alias, i.e. a secondary name for the region. (This has nothing to do with Mojo's alias keyword.) Like a field, the region can only be accessed through an instance, but unlike a field, no storage is allocated for it. It should be understood as a kind of "field alias". Field aliases are a feature that Mojo might eventually support more broadly, e.g. so that traits can declare fields.

Notice that __getitem__ returns a reference parameterized by the item region. This allows us to construct pointers to the items:

people: List[Person] = List(...)
selected_person: Pointer[people.items] = Pointer(people[2])

# Delete the list. Notably, this requires `people.buffer` to be mutable!
people^.__del__()

# The pointer can no longer be used, because `people.buffer` was mutated.
print(selected_person[].name)   # Error: cannot dereference invalid pointer

This example is similar to what we've seen earlier in this proposal, so we've almost come full-circle. There's one discrepancy: we're using this special .items field for referring to the item region, rather than the * symbol. With this syntax, here's what "mapping" an item region to another region might look like:

pet_name: Pointer[{people.items*.pets.items*.name}] =
    Pointer(people[2].pets[0].name)

The expression {people.items*.pets.items*.name} is meant to denote the set of all names of all of the pets of all of the people in the list. The * symbol is meant to be a built-in syntax that indicates that we are "mapping" a region to another region, by selecting all instances of a field. This is a highly accurate description of what the pointer points to, but it's quite verbose.

For the sake of readability, it would be good if we could write {people*.pets*.name} as a shorthand for the above expression. But for this syntax to make sense, we need to associate people* with a region. We can achieve this by giving the region a special name:

struct List[T: CollectionElement]:
    ...
    __items__: Region[T] = __item_region_of(buffer)

Now the compiler can reason that people* is shorthand for people.__items__*. This improves readability quite a bit:

pet_name: Pointer[{people*.pets*.name}] = Pointer(people[2].pets[0].name)

The exact syntax that we end up using is not important. For example, maybe the curly braces are a bad idea. (It's definitely a work-in-progress.) The main takeaway is that by modelling dynamic containers and item regions, we get the following benefits:

  • We can prevent dangling pointers while still allowing mutable memory to be aliased.
  • We can offer a really simple and concise syntax for describing the variables that a pointer-into-a-collection can point to.

This model is expressive enough for the compiler to be able to reason about the disjointness of pointers into collections. For example:

name_ptr: Pointer[{people*.name}] = ...
address_ptr: Pointer[{people*.address}] = ...

The region parameters of name_ptr and address_ptr are clearly disjoint, because .name and .address are distinct fields. The compiler can therefore reason that the pointers don't alias. This allows the pointers to be safely passed to functions that expect them to be non-aliasing.

Mutating items without mutating the entire buffer

In the previous section, we saw that a function that takes a buffer as mut should trigger invalidation of the buffer's items. But what if an operation mutates just the items of a buffer, without mutating the whole buffer variable? For example, that's what swap does!

people: List[Person] = List(...)
selected_person: Pointer[{people*}] = Pointer(people[2])

swap(people[i], people[j])

print(selected_person[].name)

Ideally this program would compile, because there's no risk of selected_person dangling. If i or j has the value 2, the pointer's target might be mutated, but it will never be deleted. It would be a shame to needlessly invalidate pointers in this case.

In fact, the most insidious kind of mutation is the mutation performed through the pointer itself! The statement selected_person[].name = "Bob" mutates a buffer item, but we don't want this to trigger the invalidation of pointers into the buffer.

To address this concern, we need to ensure that only functions that take the whole buffer as mut trigger the invalidation of pointers. If a function mutates only the buffer's item region, pointers must be preserved. With this in mind, the above examples behave as desired.

It would be nice if functions that operate on collections could declare whether they mutate the structure of the collection, or whether they only only mutate the collection's items. Here is a plausible syntax for the latter kind of function:

struct List[T: CollectionElement]:
    ...
    fn sort(self) mut self*:
        ...
    fn reverse(self) mut self*:
        ...

Here, we're declaring that List.sort and List.reverse only mutate the items of the list. The shape of the list is untouched, and therefore pointers are preserved.

Abstracting over collections

With the above design for item regions, we have everything we need to abstract over collection types:

trait Iterator[T: AnyType, //, items: Region[T]]:
    fn __next__(mut self) raises -> ref[items]: ...

trait Iterable[T: CollectionElement]:
    __items__: Region[T]

    fn __iter__(self) -> Iterator[__items__]: ...

Notice that the Iterable trait declares an associated region __items__. Given that this is a "field alias", it should be understood as referring to some part of self. Furthermore, for the sake of generality, __items__ should be assumed to be the item region of a dynamic container, i.e. operations that mutate self should invalidate pointers into self.__items__. (It's easy to imagine having an explicit syntax for declaring that a trait's associated region belongs to a dynamic container. I'm keeping things simple, by assuming that this is always the case.)

Traits can also declare associated regions that don't overlap self. You just need to use the alias keyword, to clarify that the region is associated with the type, rather than a particular instance:

trait Thing[T: AnyType]:
    alias external_items: Region[T]
    
    fn __getitem__(self, i: Int) -> ref[external_items]: ...

Unresolved questions about dynamic containers

This concludes the tour of dynamic containers.

There are a few questions that I haven't had the time to resolve:

  1. How can a tagged union (e.g. Variant) be modelled as a dynamic container? It's obviously a container of one item. The tricky part is modelling the item's region. I suspect we'll want to follow in the footsteps of C unions, and expose a different region for each payload type.
  2. For collection types where the buffer is never reallocated—such as InlineFixedVector—can we model the fact that append doesn't invalidate pointers?
  3. What does invalidating a pointer actually do? Is the pointer destroyed? If I "fix" the pointer by updating its address, can I use it again? Maybe invalidation should be a fatal error: you must proactively avoid performing an operation that would trigger an invalidation.

This is future work.

Goodbye "mutable references"

The last major aspect of this proposal is about how "mutable references" are modelled.

In Rust, and in today's Mojo, references and pointers have mutability integrated into their types. In the case of Mojo, a pointer inherits its mutability from its origin parameter. This design enables Mojo functions to be "parametric over mutability", which avoids an annoying issue that occurs in Rust and C++, where a method foo sometimes needs to be split into two versions: one that returns a mutable reference, and one that returns an immutable reference.

Parametric mutability works well for functions, but is problematic for structs. For example, here's a struct written using the features of today's Mojo:

@value
struct StringWrapper[origin: MutableOrigin](Stringable):
    var target: Pointer[String, origin]
    
    fn capitalize(self):
        self.target[] = self.target[].upper()

    fn __str__(self) -> String:
        return self.target[]

StringWrapper is a type that references a string, and offers various methods for reading and mutating it. This is just a toy example, but practical types such as StringSlice and Span have a similar structure. The origin parameter has been defined as a MutableOrigin because capitalize needs to mutate the referenced string.

Unfortunately, the MutableOrigin bound causes all methods of StringWrapper to be treated as potentially mutating the origin, even though __str__ doesn't need to. This is a major problem, because when a StringWrapper is passed to a function that expects a Stringable, __str__ is the ONLY method that should be taken into consideration:

# This function takes two Stringables
fn print_args[X: Stringable, Y: Stringable](x: X, y: Y):
    print(str(x), str(y))

s = String("test")
x = StringWrapper(Pointer.address_of(s))
print_args(x, x)    # This is safe, but how do we convince the compiler?

Notice the statement print_args(x, x). The variable x is aliased during the call to print_args. This should be fine, because it is only being used as a Stringable, and StringWrapper.__str__ doesn't mutate the origin. However, we have failed to express this fact in Mojo's type system. In today's Mojo, this program is rejected.

One can imagine various workarounds to fix this issue. For example, maybe for each method of StringWrapper, we could write an explicit type annotation for self that clarifies whether its origin parameter needs to be mutable. This is being done in today's Mojo (see Span.fill), but it's verbose, and it bloats the signatures appearing in API documentation.

I strongly suggest that we don't put mutability in data types at all. If we take a few steps back: mutation is something that a function does; it's not something that a data type is. This suggests that we should be modelling mutation as an effect that a function performs. If we combine this model with "regions", here's what StringWrapper would look like:

@value
struct StringWrapper[region: Region[String]](Stringable):
    var target: Pointer[region]
    
    fn capitalize(self) mut region:
        self.target[] = self.target[].upper()

    fn __str__(self) -> String:
        return self.target[]

In the above program, mutation is an effect that capitalize performs. By virtue of __str__ not declaring that effect, we know that it is safe for a StringWrapper to be aliased in contexts where it is being treated as a Stringable.

In fact, we can go a step further: we can forbid types that implement Stringable from adding mutation effects to __str__. The logic is: the signature of Stringable.__str__ doesn't declare any mutation effects, therefore the users of Stringable instances should be able to assume (for everyone's sanity) that no mutation effects are going to occur. Or at least no "raw" mutations that can lead to data races, incorrect optimizations, etc.

In contrast, it's perfectly fine for types to add extra read effects to a trait conformance, because reading aliased state is always safe, as long as nobody else is mutating it. And in fact, if StringWrapper.__str__ wasn't allowed to read from the string that it wraps, it would be pretty useless! The same is true of slices and spans.

  • Note: We may also want to require read effects to be declared explicitly, e.g. fn __str__(self) read region -> String. This would eliminate even more aliasing false-positives.

In summary: we saw earlier that Mojo has the opportunity to allow "aliasing and mutability". This eliminates the need to treat "mutable references" differently to "immutable references". In turn, this gives us the opportunity to model mutation as an effect, rather than a property of types. As seen above, this simplifies the definition of structs parameterized by regions, and helps more programs compile successfully. Ultimately, this would make it easier to write production software in Mojo.

Design summary

The major elements of the design are as follows:

  • Every reference/pointer is parameterized by a region, which denotes the set of variables that the pointer can target.
  • Aliasing restrictions only apply to region parameters. Apart from that, references/pointers are allowed to freely alias each other, irrespective of their mutability.
  • To prevent dangling pointers to container items, we introduce the concept of a dynamic container, that exposes an item region. Pointers into the item region are invalidated whenever the container its mutated.
  • Mutation is an effect that a function performs—it is not a property of a type. When implementing a trait method, you can't declare extra mutation effects.

Benefits

The major benefits of this design are as follows:

  • By eliminating the "aliasing XOR mutability" restriction amongst pointers, a much larger number of programs can be accepted by the compiler, without sacrificing memory safety. Ultimately, this would make it easier to write practical, high-performance software in Mojo.
  • The proposed model of invalidation ensures that whenever the compiler rejects a program, we can provide a simple, no-bullshit explanation about why the program is rejected. More precisely, what we are rejecting is the existence (or use) of potentially-invalid pointers, and it is obvious why this is important. In contrast, the Rust compiler rejects programs that have "aliased mutable references", and it is not immediately obvious why this is important. This has a major influence on the quality of error messages. Errors of the form "this pointer might be dangling" are far easier to understand than errors of the form "cannot create an aliased mutable reference" or "this variable is already borrowed".
  • By modelling a region as a set of variables, with a simple syntax such as {people*.name}, it becomes easier for programmers to understand what a pointer can actually point to. This makes programs self-documenting, and makes it easier to understand why a function call violates an aliasing rule, or why it triggers an invalidation. Furthermore, this model also helps the compiler reason more carefully about aliasing. For example, we can reason that {people*.name} and {people*.address} are disjoint. This is not possible in Rust or in today's Mojo.
  • By modelling mutation as an effect, we enable a struct's author to specify whether each method needs to read and/or mutate the struct's region parameter(s). This has many advantages, for example:
    • This simplifies function signatures, compared to today's trick of explicitly annotating self with a refined type.
    • This model allows types such as Span and StringSlice to outlive the data they point to, because we can now express the fact that their destructors don't need to read the data. This allows unused resources to be reclaimed sooner, which can improve performance and reduce memory usage.

Limitations

While this design is much more flexible than Rust-style lifetimes, it does not cover 100% of the features of Rust lifetimes.

In Rust, a list of mutable references—which are parameterized by the same lifetime—are guaranteed to never alias each other, i.e. they are guaranteed to be "unique" references. The design I've presented is unable to model this phenomenon.

I consider this to be a worthwhile trade-off. After all, the proposed design doesn't prevent programmers from working with unique references, it just won't provide any guarantees at compile-time. If the programmer needs to "convince" the compiler that a pair of references taken from the same list are unique, they can do this using unsafe code, as was demonstrated with the swap method.

If anyone suspects there would be major downsides to not modelling unique references in Mojo's type system, I'd be interested in hearing about this.

Next steps

This document was meant to be a high-level overview, to gauge whether the Mojo community is interested in seeing this design through to completion. I am not a Modular employee, so I am not in charge of whether this design gets adopted. However, if you have any feedback (good or bad), please let me know.

I have omitted a lot of details. Some details were omitted for the sake of brevity, others were omitted because I haven't had time to investigate them yet. If there is sufficient interest in this design, I can write some follow-up documents to fill in the gaps.

If you'd like to see me spend more time on this design, you can sponsor me on Github. Fleshing out this design will take several more months of work at least, and I doubt I'm going to be able to finish this design without financial support.

@nmsmith
Copy link
Author

nmsmith commented Nov 12, 2024

Why do regions have to be disjoint?

Disclaimer: In Mojo the term "argument" is used for function inputs declared within parentheses, and "parameter" is used for the inputs declared within square brackets. I imagine anyone who doesn't know this would be confused by how I've been using these words.

There are actually two disjointness rules that I believe need to be enforced. I haven't made this clear enough, sorry. The rules are:

  1. Given a pair of variables in a region, e.g. {x, y}, neither variable is allowed to be a container that stores the other variable as an element. Or equivalently: neither variable can contain the other variable in its ownership tree. This ensures that given two pointers of type Pointer[{x, y}], mutating one pointee cannot trigger the deallocation of the other pointee.
  2. Given a pair of region parameters in a function signature, if either of the regions are mutated by the function, the memory owned by the variables in region A must not overlap the memory owned by the variables in region B. This is just Rust's "aliasing XOR mutability" rule, but applied at the level of regions. This is valuable for all the same reasons as in Rust.

Attempting to compare my model to Rust's NLL model is perhaps more confusing than it is illuminating, because my design isn't derived from NLL; it's a clean sheet design. A lot of the fundamentals have changed. For example, the code snippet in your example above uses a single lifetime parameter for the argument types and the result type, but in my model, you actually need two distinct parameters:

fn func[rx: Region[Int32], ry: Region[Int32]](ref[rx] x: Int32, ref[ry] y: Int32) -> ref[{rx*, ry*}] Int32:

This is important when it comes to enforcing the disjointness rules. If the arguments were parameterized by the same region, they would be allowed to alias—even mutably alias—and because there is only one region parameter, there would be no disjointness rules to enforce. But once we add a second region parameter, we need to enforce that if one of the regions is mutated (this doesn't occur in your example), then it needs to be disjoint from other region parameters. This allows the programmer and the compiler to reason about the independence of mutations through arguments.

@nmsmith
Copy link
Author

nmsmith commented Nov 12, 2024

I think I made a bad choice by using the term "region", because it sounds like a region should be a range of memory addresses, but a Region is actually just a compile-time value that denotes a set of variables. These variables obviously reside in memory, but a Region should not be conflated for a range of addresses. For example, the regions {thing} and {thing.x} are disjoint as sets, but not disjoint as objects in memory. When I'm talking about disjointness, I'm talking about objects in memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment