Skip to content

Instantly share code, notes, and snippets.

@nihalpasham
Last active April 18, 2026 14:11
Show Gist options
  • Select an option

  • Save nihalpasham/276aae0b5adbb2538f226ce8a68c8b24 to your computer and use it in GitHub Desktop.

Select an option

Save nihalpasham/276aae0b5adbb2538f226ce8a68c8b24 to your computer and use it in GitHub Desktop.
How Pointers Are Really Represented in MIR

How Pointers Are Really Represented in MIR

Represents pointers in MIR — both at rest (in the Allocation memory model) and in motion (through strict provenance cast kinds).

Table of Contents

  1. The Code
  2. Dumping the MIR
  3. The Allocation Memory Model
  4. Anatomy of the MIR — Pointer Constants
  5. Anatomy of the MIR — Strict Provenance Casts
  6. How a Backend Dispatches on Cast Kinds
  7. Key Takeaways

The Code

// ── Layer 1: Pointer constants — how the Allocation model stores them ──

fn pointer_to_static_data() -> bool {
    let data: &[u32; 4] = &[10, 20, 30, 40];
    data[2] == 30
}

fn nested_pointer() -> u32 {
    let inner: &u32 = &42;
    let outer: &(&u32, u32) = &(inner, 7);
    *outer.0 + outer.1
}

// ── Layer 2: Strict provenance casts — each gets a distinct CastKind in MIR ──

fn expose_address(ptr: *const u32) -> usize {
    ptr as usize
}

fn from_exposed_addr(addr: usize) -> *const u32 {
    addr as *const u32
}

fn reinterpret_ptr(ptr: *const u32) -> *const f32 {
    ptr as *const f32
}

fn round_trip(ptr: *const u32) -> *const u32 {
    let addr: usize = ptr as usize;
    let reconstructed: *const u32 = addr as *const u32;
    reconstructed
}

Two groups.

  • The first group creates pointer-typed constants that the compiler promotes to static memory — this is where the Allocation model reveals itself.
  • The second group performs pointer casts — this is where MIR's strict provenance cast kinds appear.

Dumping the MIR

cargo +nightly rustc -- -Z unpretty=mir

The Allocation Memory Model

Before looking at the MIR, we need to understand how the compiler stores compile-time values internally. Every constant — a promoted &[u32; 4], a byte string literal, a static — lives in an Allocation.

Here are the actual types from rustc_public (stable MIR):

// rustc_public/src/ty.rs

pub type Bytes = Vec<Option<u8>>;

pub struct Prov(pub AllocId);

pub struct ProvenanceMap {
    /// Provenance in this map applies from the given offset for an
    /// entire pointer-size worth of bytes. Two entries in this map
    /// are always at least a pointer size apart.
    pub ptrs: Vec<(Size, Prov)>,
}

pub struct Allocation {
    pub bytes:      Bytes,
    pub provenance: ProvenanceMap,
    pub align:      Align,
    pub mutability: Mutability,
}

// rustc_public/src/mir/alloc.rs

/// A unique identification number for each provenance
pub struct AllocId(usize);

Five things to notice:

  1. Bytes = Vec<Option<u8>> — the raw byte content of the entire allocation, whatever it holds. Each byte is Option<u8>: Some(v) means initialized to v, None means uninitialized (padding bytes, MaybeUninit, etc.). The meaning of the bytes depends on whether a provenance entry covers them:

    • Bytes covered by a provenance entry — these are the bytes that make up a pointer. Their value is the offset into the target allocation (not the final address — that's determined at link time).
    • Bytes not covered by provenance — plain data. A u32 with value 42 is just [42, 0, 0, 0].
  2. AllocId — a unique numeric ID assigned to each allocation the compiler creates. Think of it as a pointer's birth certificate — it records which allocation this pointer was born from. A bare address like 0x7fff5e4a3b2c tells you where in memory, but not which allocation granted access. Two allocations could sit at the same address at different times (after a free and realloc), or a pointer could be arithmetically constructed to hit an address it was never meant to reach. The AllocId is what disambiguates these cases. When you follow an AllocId you get a GlobalAlloc, which is usually GlobalAlloc::Memory(Allocation) — another Allocation with its own bytes and provenance.

  3. ProvenanceMap — a side table that marks which byte ranges within this allocation are pointers. Each entry (offset, Prov(AllocId)) means: "starting at this offset (within this allocation), the next 8 bytes (on 64-bit) are a pointer into the allocation identified by AllocId." There are two different offsets at play:

    • ProvenanceMap offset — where in this allocation the pointer sits (e.g., offset 0 means bytes 0–7 are a pointer, offset 8 means bytes 8–15 are a pointer).
    • Bytes value at that range — the offset into the target allocation that the pointer points to (e.g., all zeros = points to the start of the target).

    Concrete example — a struct (&u32, &u32) holding pointers to two different values:

    Allocation for the struct (16 bytes on 64-bit):
        bytes:      [0,0,0,0,0,0,0,0,  0,0,0,0,0,0,0,0]
        provenance: [(offset=0, AllocId(A)), (offset=8, AllocId(B))]
    
    AllocId(A):  bytes: [42, 0, 0, 0],  provenance: []   ← the first u32
    AllocId(B):  bytes: [99, 0, 0, 0],  provenance: []   ← the second u32
    

    Bytes 0–7 of the struct are a pointer to AllocId(A). Bytes 8–15 are a pointer to AllocId(B). Both byte ranges are all zeros because both pointers point to offset 0 within their respective targets. This is why the doc comment says "two entries in this map are always at least a pointer size apart" — they can't overlap.

  4. Const evaluation uses Miri — the compiler's const evaluator (which evaluates const fn, const blocks, static initializers, and promoted constants) is the same Miri interpreter engine that cargo miri uses. Every Allocation is produced by Miri executing a MIR body at compile time. The standalone cargo miri tool is a superset that can interpret entire programs and adds extra UB-detection checks (Stacked/Tree Borrows, data race detection, etc.).

  5. is_null() reveals the model — look at how the compiler checks for null:

// rustc_public/src/ty.rs — Allocation::is_null()

pub fn is_null(&self) -> Result<bool, Error> {
    let len = self.bytes.len();
    let ptr_len = MachineInfo::target_pointer_width().bytes();
    if len != ptr_len {
        return Err(error!("Expected width of pointer (`{ptr_len}`), but found: `{len}`"));
    }
    Ok(self.read_uint()? == 0 && self.provenance.ptrs.is_empty())
}

A pointer is null only if the bytes are zero AND there's no provenance. 0x0000000000000000 with a provenance entry is not null — it's a valid pointer whose address will be determined at link time.

What Provenance Points To

When you follow an AllocId, you get a GlobalAlloc:

// rustc_public/src/mir/alloc.rs

pub enum GlobalAlloc {
    Function(Instance),                              // Function pointer (no bytes, just an ID)
    VTable(Ty, Option<Binder<ExistentialTraitRef>>), // Vtable for trait object
    Static(StaticDef),                               // Named static (lazy)
    Memory(Allocation),                              // Real allocation with bytes
    TypeId { ty: Ty },                               // Type ID segment (for TypeId::of::<T>)
}

Most of the time you get GlobalAlloc::Memory(Allocation) — another Allocation with its own bytes and its own provenance map. Pointers can point to structs that contain more pointers. Each nested pointer is another provenance entry pointing to another AllocId.


Anatomy of the MIR — Pointer Constants

pointer_to_static_data

Rust source:

fn pointer_to_static_data() -> bool {
    let data: &[u32; 4] = &[10, 20, 30, 40];
    data[2] == 30
}

MIR output:

fn pointer_to_static_data() -> bool {
    let mut _0: bool;
    let _1: &[u32; 4];
    let mut _2: u32;
    let _3: usize;
    let mut _4: bool;
    scope 1 {
        debug data => _1;
    }

    bb0: {
        _1 = const pointer_to_static_data::promoted[0];
        _3 = const 2_usize;
        _4 = Lt(copy _3, const 4_usize);
        assert(move _4, "index out of bounds: the length is {} but the index is {}",
               const 4_usize, copy _3) -> [success: bb1, unwind continue];
    }

    bb1: {
        _2 = copy (*_1)[_3];
        _0 = Eq(move _2, const 30_u32);
        return;
    }
}

const pointer_to_static_data::promoted[0]: &[u32; 4] = {
    let mut _0: &[u32; 4];
    let mut _1: [u32; 4];

    bb0: {
        _1 = [const 10_u32, const 20_u32, const 30_u32, const 40_u32];
        _0 = &_1;
        return;
    }
}

Walking through it:

  • _1: &[u32; 4]data in our source. It's a reference to an array, not the array itself.
  • _1 = const pointer_to_static_data::promoted[0] — the array literal [10, 20, 30, 40] has been promoted out of the function body into a separate constant. The function receives a &'static [u32; 4].
  • promoted[0] has its own MIR body: it constructs [10, 20, 30, 40] into _1, takes a reference _0 = &_1, and returns the reference. This looks like it returns a reference to a local — which would be use-after-free in runtime code. But this MIR body is const evaluation syntax, not code that ever runs on the machine. The const evaluator (Miri) executes it at compile time: it creates an Allocation for the array, and _0 = &_1 produces a pointer-typed Allocation with provenance pointing to that array. The array becomes a permanent static allocation — there's no stack frame being torn down. The return means "this is the result of const evaluation."
  • Back in bb0, the compiler inserts a bounds check: Lt(copy _3, const 4_usize) — is 2 < 4? If not, panic with "index out of bounds".
  • _2 = copy (*_1)[_3] — dereference the pointer, index by _3 (= 2), copy the element.
  • _0 = Eq(move _2, const 30_u32) — compare with 30.

What the compiler actually stores for promoted[0]

This is the part that isn't visible in the MIR text output. When you query promoted[0] through rustc_public, its type is &[u32; 4] — a pointer. The compiler represents this as:

promoted[0] allocation (the pointer):
    bytes: [0, 0, 0, 0, 0, 0, 0, 0]     ← offset 0 into target (points to start of array)
    provenance: [(offset=0, AllocId(N))]  ← bytes 0–7 of THIS allocation are a pointer to AllocId(N)

AllocId(N) (the actual array data):
    bytes: [10,0,0,0, 20,0,0,0, 30,0,0,0, 40,0,0,0]  ← four u32s, little-endian
    provenance: []                                      ← no pointers here, just data

Two allocations. The first holds the pointer itself — 8 bytes + a provenance entry. The bytes ([0,0,0,0,0,0,0,0]) are the offset into the target allocation (zero means "points to the start"). These zeros are not uninitialized and not placeholder — they are Some(0), a meaningful offset. At link time, the linker places AllocId(N) at a concrete base address (say 0x402000) and patches the pointer to base + offset = 0x402000 + 0. The second allocation holds the data.

A backend that reads raw_bytes() from the pointer allocation gets [0,0,0,0,0,0,0,0]. If it interprets those bytes as the final address, it ends up with a null pointer. You must check the provenance map — if it's non-empty, follow the AllocId to reach the actual data.

┌──────────────────────────────────────────────────────────────────────┐
│  promoted[0]: &[u32; 4]                                              │
│                                                                      │
│  ┌─────────────────────────────────────────────┐                     │
│  │ Allocation (the pointer itself)             │                     │
│  │ bytes: [0,0,0,0,0,0,0,0]  ← offset 0        │──── provenance ──┐  │
│  │ provenance: [(offset=0, AllocId(N))]        │                  │  │
│  │              ↑ where in THIS alloc          │                  │  │
│  └─────────────────────────────────────────────┘                  │  │
│                                                                   │  │
│                     bytes value = 0 (offset into TARGET) ─────┐   │  │
│                                                               │   │  │
│                                                               ▼   ▼  │
│  ┌───────────────────────────────────────────────────────┐           │
│  │ AllocId(N) — The actual array (plain data)            │           │
│  │ bytes: [10,0,0,0, 20,0,0,0, 30,0,0,0, 40,0,0,0]       │           │
│  │        └──10──┘  └──20──┘  └──30──┘  └──40──┘         │           │
│  │ provenance: []  ← no pointers, just data              │           │
│  └───────────────────────────────────────────────────────┘           │
│                                                                      │
│  At link time: final address = base_of(AllocId(N)) + 0               │
└──────────────────────────────────────────────────────────────────────┘

nested_pointer

Rust source:

fn nested_pointer() -> u32 {
    let inner: &u32 = &42;
    let outer: &(&u32, u32) = &(inner, 7);
    *outer.0 + outer.1
}

MIR output:

fn nested_pointer() -> u32 {
    let mut _0: u32;
    let _1: &u32;
    let _2: &(&u32, u32);
    let _3: (&u32, u32);
    let mut _4: u32;
    let mut _5: u32;
    let mut _6: (u32, bool);
    let mut _7: &u32;
    scope 1 {
        debug inner => _1;
        scope 2 {
            debug outer => _2;
        }
    }

    bb0: {
        _1 = const nested_pointer::promoted[0];
        _3 = (copy _1, const 7_u32);
        _2 = &_3;
        _7 = copy ((*_2).0: &u32);
        _4 = copy (*_7);
        _5 = copy ((*_2).1: u32);
        _6 = AddWithOverflow(copy _4, copy _5);
        assert(!move (_6.1: bool), "attempt to compute `{} + {}`, which would overflow",
               move _4, move _5) -> [success: bb1, unwind continue];
    }

    bb1: {
        _0 = move (_6.0: u32);
        return;
    }
}

const nested_pointer::promoted[0]: &u32 = {
    let mut _0: &u32;
    let mut _1: u32;

    bb0: {
        _1 = const 42_u32;
        _0 = &_1;
        return;
    }
}

Walking through bb0:

  • _1 = const nested_pointer::promoted[0]inner in our source. The &42 is promoted: a &'static u32 pointing to an allocation holding [42, 0, 0, 0].
  • _3 = (copy _1, const 7_u32) — construct the tuple (&u32, u32) on the stack. The first field is a copy of the promoted pointer. The second is an inline constant 7.
  • _2 = &_3 — take a reference to the tuple, giving us &(&u32, u32).
  • _7 = copy ((*_2).0: &u32) — dereference _2 to get the tuple, then read field 0 (the &u32). The : &u32 annotation is MIR's way of showing the type of the field being projected.
  • _4 = copy (*_7) — dereference that pointer to get the 42. This is the second pointer chase.
  • _5 = copy ((*_2).1: u32) — read field 1 of the tuple directly. No pointer chase — it's a u32, not a reference.
  • _6 = AddWithOverflow(copy _4, copy _5)42 + 7, returning (u32, bool). The bool is the overflow flag.
  • assert(!move (_6.1: bool), ...) — panic if overflow occurred.
  • In bb1: _0 = move (_6.0: u32) — extract the sum and return it.

The key point: the tuple on the stack has two fields with different representations. Field 0 (&u32) is 8 bytes that make up a pointer — they store the offset into the target allocation, and a provenance entry marks them as a pointer to the promoted 42. Field 1 (u32) is 4 bytes of plain data (7) with no provenance. A backend materializing this tuple must distinguish the two: follow provenance for field 0 to get the actual address, read the bytes directly for field 1.


Anatomy of the MIR — Strict Provenance Casts

When you write ptr as usize in Rust, MIR doesn't generate a generic "cast." It generates a specific cast kind that records the semantic intent.

The CastKind enum from rustc_public:

// rustc_public/src/mir/body.rs

pub enum CastKind {
    PointerExposeProvenance,           // ptr as usize
    PointerWithExposedProvenance,      // usize as *const T
    PointerCoercion(PointerCoercion),
    IntToInt,
    FloatToInt,
    FloatToFloat,
    IntToFloat,
    PtrToPtr,
    FnPtrToPtr,
    Transmute,
}

Three of these are pointer casts. They look similar in surface Rust but have completely different semantics.

expose_address — ptr → integer

Rust source:

fn expose_address(ptr: *const u32) -> usize {
    ptr as usize
}

MIR output:

fn expose_address(_1: *const u32) -> usize {
    debug ptr => _1;
    let mut _0: usize;

    bb0: {
        _0 = copy _1 as usize (PointerExposeProvenance);
        return;
    }
}

The cast kind in parentheses is PointerExposeProvenance. This tells the compiler: "I am extracting the address AND exposing this pointer's provenance." In other words, ptr as usize strips the pointer's birth certificate off (turning it into a bare number), while simultaneously registering that birth certificate in a global "exposed" set so it can be recovered later.

In the MIR text, nightly currently prints PointerExposeProvenance (the rename from PointerExposeAddress has landed). In stable MIR's CastKind enum, the old name still appears with a FIXME comment.

from_exposed_addr — integer → ptr

Rust source:

fn from_exposed_addr(addr: usize) -> *const u32 {
    addr as *const u32
}

MIR output:

fn from_exposed_addr(_1: usize) -> *const u32 {
    debug addr => _1;
    let mut _0: *const u32;

    bb0: {
        _0 = copy _1 as *const u32 (PointerWithExposedProvenance);
        return;
    }
}

PointerWithExposedProvenance. This tells the compiler: "I am constructing a pointer from a bare integer, and it should pick up whatever provenance was previously exposed for this address." In other words, addr as *const T is asking: "give me back some birth certificate that was previously registered for this address."

This is the inverse of PointerExposeProvenance, but they are not symmetric. Exposing is always safe to add (it just widens what's exposed). Reconstructing is where the model gets strict — the resulting pointer's validity depends on whether matching provenance was exposed earlier. If the integer was never an address with exposed provenance (e.g., a random 12345), the cast itself still succeeds — you get a pointer-shaped value — but dereferencing it is UB because no birth certificate exists to reclaim.

reinterpret_ptr — ptr → ptr

Rust source:

fn reinterpret_ptr(ptr: *const u32) -> *const f32 {
    ptr as *const f32
}

MIR output:

fn reinterpret_ptr(_1: *const u32) -> *const f32 {
    debug ptr => _1;
    let mut _0: *const f32;

    bb0: {
        _0 = copy _1 as *const f32 (PtrToPtr);
        return;
    }
}

PtrToPtr. No provenance is exposed or reconstructed. The pointer keeps pointing to the same allocation with the same permissions — you're just reinterpreting the pointee type. This lowers to a simple bitcast in LLVM.

round_trip — both cast kinds in sequence

Rust source:

fn round_trip(ptr: *const u32) -> *const u32 {
    let addr: usize = ptr as usize;
    let reconstructed: *const u32 = addr as *const u32;
    reconstructed
}

MIR output:

fn round_trip(_1: *const u32) -> *const u32 {
    debug ptr => _1;
    let mut _0: *const u32;
    let _2: usize;
    scope 1 {
        debug addr => _2;
        scope 2 {
            debug reconstructed => _0;
        }
    }

    bb0: {
        _2 = copy _1 as usize (PointerExposeProvenance);
        _0 = copy _2 as *const u32 (PointerWithExposedProvenance);
        return;
    }
}

Two lines, two different cast kinds:

  • _2 = copy _1 as usize (PointerExposeProvenance) — expose: ptrtoint.
  • _0 = copy _2 as *const u32 (PointerWithExposedProvenance) — reconstruct: inttoptr.

Notice the scoping: MIR tracks that _2 corresponds to addr and _0 to reconstructed, even though they're all in one basic block.

Why these distinctions exist

This reflects Rust's strict provenance model (RFC 3559). A pointer is not just an address — it carries permission to access a particular allocation. The three cast kinds encode three different relationships to that permission:

Cast Kind What happens to provenance
PointerExposeProvenance Provenance is "leaked" — optimizer can no longer assume it's private
PointerWithExposedProvenance New pointer inherits whatever provenance was exposed for that address
PtrToPtr Provenance is unchanged — same allocation, same permissions

A backend that conflates these — e.g., treats all pointer↔integer casts as bitcast — miscompiles. The LLVM optimizer is allowed to make different aliasing assumptions for ptrtoint/inttoptr versus bitcast.


How a Backend Dispatches on Cast Kinds

In my custom backend, we preserve the cast kind from MIR as an attribute and dispatch on it when lowering to LLVM:

// crates/mir-lower/src/convert/ops/cast.rs

let llvm_op = match &cast_kind {
    MirCastKindAttr::PointerExposeAddress => {
        llvm::PtrToIntOp::new(ctx, val, llvm_ty).get_operation()
    }
    MirCastKindAttr::PointerWithExposedProvenance => {
        llvm::IntToPtrOp::new(ctx, val, llvm_ty).get_operation()
    }
    MirCastKindAttr::PtrToPtr | MirCastKindAttr::FnPtrToPtr | ... => {
        emit_pointer_cast(ctx, rewriter, val, val_ty, llvm_ty)?
    }
    MirCastKindAttr::IntToInt => { ... }  // sext / zext / trunc
    MirCastKindAttr::IntToFloat => { ... }  // sitofp / uitofp
    // ... 15+ cast kinds total
};

The full dispatch table:

MirCastKindAttr LLVM Operation
PointerExposeAddress ptrtoint
PointerWithExposedProvenance inttoptr
PtrToPtr bitcast (or addrspacecast)
IntToInt (wider, signed) sext
IntToInt (wider, unsigned) zext
IntToInt (narrower) trunc
IntToFloat sitofp or uitofp
FloatToInt llvm.fptosi.sat / llvm.fptoui.sat
FloatToFloat fpext or fptrunc
Transmute bitcast or memory
PointerCoercionUnsize insertvalue (fat pointer)

I also verify that types match the cast kind. For example, PointerExposeAddress requires a pointer operand and an integer result:

// crates/dialect-mir/src/ops/cast.rs — MirCastOp::verify()

MirCastKindAttr::PointerExposeAddress => {
    if opd_ty_obj.downcast_ref::<MirPtrType>().is_none() {
        return verify_err!(loc, "PointerExposeAddress cast requires pointer operand type");
    }
    if res_ty_obj.downcast_ref::<IntegerType>().is_none() {
        return verify_err!(loc, "PointerExposeAddress cast requires integer result type");
    }
}
MirCastKindAttr::PointerWithExposedProvenance => {
    if opd_ty_obj.downcast_ref::<IntegerType>().is_none() {
        return verify_err!(loc, "PointerWithExposedProvenance cast requires integer operand type");
    }
    if res_ty_obj.downcast_ref::<MirPtrType>().is_none() {
        return verify_err!(loc, "PointerWithExposedProvenance cast requires pointer result type");
    }
}

Key Takeaways

  1. A pointer in MIR is bytes + provenance — the Allocation struct stores raw bytes and a separate ProvenanceMap side table. The provenance map says which byte ranges within the allocation are pointers (and which AllocId they point to). The bytes at those ranges are the offset into the target allocation — the final address is base_of(target) + offset, resolved at link time
  2. Bytes = Vec<Option<u8>> — the raw byte content of the entire allocation. Some(v) = initialized, None = uninitialized. Bytes covered by a provenance entry are a pointer's offset; bytes not covered are plain data
  3. is_null() checks both — zero bytes AND empty provenance. Zero-byte pointers with provenance are valid, not null — they point to offset 0 of a real allocation
  4. Follow provenance for real data — a backend that reads raw_bytes() without checking provenance sees zeros and gets a null pointer. Check the ProvenanceMap — if non-empty, follow AllocIdGlobalAlloc::Memory(target_alloc) to reach the actual data
  5. Const evaluation uses Miri — promoted constants, const fn, and static initializers are all evaluated by the same Miri interpreter engine at compile time. The MIR bodies for promoted constants (e.g., promoted[0]) are const-eval recipes, not runtime code
  6. MIR has three distinct pointer castsPointerExposeProvenance (ptrtoint), PointerWithExposedProvenance (inttoptr), PtrToPtr (bitcast). A backend must dispatch on each separately

Reproducing

cargo +nightly rustc -- -Z unpretty=mir

Related Reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment