Skip to content

Instantly share code, notes, and snippets.

@nihalpasham
Created April 7, 2026 07:50
Show Gist options
  • Select an option

  • Save nihalpasham/aea9ee4f1f14992f8ef60c063591b0e1 to your computer and use it in GitHub Desktop.

Select an option

Save nihalpasham/aea9ee4f1f14992f8ef60c063591b0e1 to your computer and use it in GitHub Desktop.
ZSTs occupy zero bytes at runtime — the compiler strips it before generating machine code. But in MIR, it gets real operands, real types, and if you're building a backend that consumes MIR… real bugs.

Zero-Sized Types Actually Exist in MIR


Intro

PhantomData has no runtime representation. It occupies zero bytes of memory. The compiler erases it completely before generating machine code.

That's the story, right? And it's true — at runtime.

But between the Rust you write and the machine code you get, there's MIR — Mid-level Intermediate Representation — and in MIR, PhantomData gets real operands, real types, and if you're building a backend that consumes MIR... real bugs.

Lets talk: exactly where Zero-Sized Types show up in MIR, why the compiler keeps them around, and the surprisingly subtle type-precision problem that bit me when I was building a MIR backend.


What Are Zero-Sized Types?

Let's start with the basics. A Zero-Sized Type — a ZST — is a Rust type that carries meaning at the type level but occupies exactly zero bytes at runtime.

Pattern Example Purpose
PhantomData<T> PhantomData<&'a T> Lifetime / variance tracking
Unit structs struct Locked; Type-level tags
Empty structs struct Marker {} Typestate markers
Unit type () "No value" / void return
Never type ! Impossible values

The most important one for today is PhantomData. It's everywhere in the standard library.

Here's a simplified version of what the standard library's slice iterator looks like:

pub struct Iter<'a, T: 'a> {
    ptr: *const T,
    end: *const T,
    _marker: PhantomData<&'a T>,  // ← zero bytes
}

Three fields. Two pointers that the iterator actually uses, and one PhantomData that exists purely so the compiler knows this iterator borrows something with lifetime 'a.

At runtime, Iter is just 16 bytes — two pointers. PhantomData contributes nothing.

So naturally, when you lower Rust to any IR, you'd expect PhantomData to just... not be there. Right?

Let's look.


PhantomData Is a Real Operand in MIR

I have a small example here. We've got our own Iter struct, same shape as the standard library's, and a function that constructs one:

struct Iter<'a, T: 'a> {
    ptr: *const T,
    end: *const T,
    _marker: PhantomData<&'a T>,
}

fn create_iter<'a>(slice: &'a [u32]) -> Iter<'a, u32> {
    let ptr = slice.as_ptr();
    let end = unsafe { ptr.add(slice.len()) };
    Iter { ptr, end, _marker: PhantomData }
}

Now let's dump the MIR. You can do this yourself with:

cargo rustc -- -Zunpretty=mir

Here's what the compiler produces:

fn create_iter(_1: &[u32]) -> Iter<'_, u32> {
    let mut _0: Iter<'_, u32>;
    let _2: *const u32;
    let mut _4: usize;

    bb0: {
        _2 = core::slice::<impl [u32]>::as_ptr(copy _1) -> [return: bb1, unwind continue];
    }

    bb1: {
        _4 = PtrMetadata(copy _1);
        _3 = std::ptr::const_ptr::<impl *const u32>::add(copy _2, move _4) -> [return: bb2, unwind continue];
    }

    bb2: {
        _0 = Iter::<'_, u32> { ptr: copy _2, end: copy _3, _marker: const PhantomData::<&u32> };
        return;
    }
}

Look at bb2. The aggregate construction.

_0 = Iter::<'_, u32> { ptr: copy _2, end: copy _3, _marker: const PhantomData::<&u32> };

Three fields, three operands. PhantomData is right there as a real Constant operand in the MIR. It's not erased. It's not optimized away. It's a concrete value that the compiler hands to any backend consuming this MIR.

And this isn't a one-off. Let me show you a struct with two PhantomData fields:

struct Multi<A, B> {
    data: u64,
    _a: PhantomData<A>,
    _b: PhantomData<B>,
}
fn create_multi() -> Multi<Locked, Unlocked> {
    bb0: {
        _0 = Multi::<Locked, Unlocked> {
            data: const 42_u64,
            _a: const PhantomData::<Locked>,
            _b: const PhantomData::<Unlocked>
        };
        return;
    }
}

Three fields, three operands. Every PhantomData field gets its own constant.

Now look at what happens when PhantomData is a function parameter and a return value:

fn accept_phantom(_1: PhantomData<u32>) -> PhantomData<u32> {
    debug _marker => const PhantomData::<u32>;
    let mut _0: std::marker::PhantomData<u32>;

    bb0: {
        return;
    }
}

The function signature says it takes a PhantomData<u32> and returns a PhantomData<u32>. But the body? Just return. No loads, no stores, no operations. The value is const-evaluated to nothing — but the type is still there in the signature.

Same thing for a unit struct:

fn pass_unit_struct(_1: Locked) -> Locked {
    debug _tag => const Locked;
    let mut _0: Locked;

    bb0: {
        return;
    }
}

Locked is a ZST. The parameter _1 exists in the MIR, it has a type, but notice the debug line: debug _tag => const Locked. The compiler knows the value is just... the constant Locked. There's nothing to load.


Where It Actually Matters: for Loops and Slice Iterators

If these were just toy examples, you might not care. But PhantomData shows up in every for loop over a slice. Here's a simple sum:

fn sum_slice(data: &[u32]) -> u32 {
    let mut total = 0u32;
    for &val in data.iter() {
        total += val;
    }
    total
}

Three lines of logic. Let's look at the MIR:

fn sum_slice(_1: &[u32]) -> u32 {
    let mut _0: u32;
    let mut _2: u32;
    let mut _3: std::slice::Iter<'_, u32>;   // ← contains PhantomData internally
    let mut _4: std::slice::Iter<'_, u32>;
    let mut _6: std::option::Option<&u32>;
    ...

    bb0: {
        _2 = const 0_u32;
        _4 = core::slice::<impl [u32]>::iter(copy _1) -> [return: bb1, unwind continue];
    }

    bb1: {
        _3 = <std::slice::Iter<'_, u32> as IntoIterator>::into_iter(move _4) -> [return: bb2, ...];
    }

    bb3: {
        _7 = &mut _5;
        _6 = <std::slice::Iter<'_, u32> as Iterator>::next(copy _7) -> [return: bb4, ...];
    }

    bb4: {
        _8 = discriminant(_6);
        switchInt(move _8) -> [0: bb7, 1: bb6, otherwise: bb5];
    }
    ...
}

Local _3 and _5 have type std::slice::Iter<'_, u32>. That type — from the standard library — contains PhantomData<&'a u32> as a real field.

When the iterator machinery calls .iter(), constructs an Iter, calls .next(), pattern-matches on Option<&u32> — all of that code is operating on a struct that carries a PhantomData field. Any backend consuming this MIR has to handle that field during construction, destruction, and field access.

This isn't obscure. This is what happens behind every for val in slice loop. PhantomData doesn't disappear at the MIR level — it's threaded through the entire iterator pipeline.


The Type Precision Trap

Here's where it gets subtle, and where I actually hit a bug.

When you're building a backend, you need to translate MIR types into your target IR. For ZSTs, the natural representation is "empty" — no fields, no bytes. But there are two kinds of empty:

                  ZST Types
                 /          \
        Empty Tuple       Empty Struct
        mir.tuple <>      mir.struct <PhantomData, [], []>
        (aka ())          (aka PhantomData<T>)

An empty tuple and an empty struct both have zero fields and zero bytes. But they are different types. And this matters.

Here's why. When MIR constructs an Iter, it expects three operands:

_0 = Iter::<'_, u32> {
    ptr:     copy _2,            // field 0: pointer type
    end:     copy _3,            // field 1: pointer type
    _marker: const PhantomData   // field 2: struct type (PhantomData)
};

Field 2 has type PhantomData<&u32>. That's a struct type. If your backend translates PhantomData as an empty tuple — because hey, it's zero-sized, who cares — you get a type mismatch.

Field 2 expects a struct. You gave it a tuple. If your IR has any kind of type verification — and it should — this will fail.

In my case, the error looked like this:

[paraphrase the verification error]

Verification error: field 2 of struct Iter expects type
  mir.struct <PhantomData, [], []>
but got
  mir.tuple <>

The fix is straightforward: when you encounter PhantomData (a struct ZST), construct an empty struct with the correct type name. When you encounter () (a tuple ZST), construct an empty tuple. Don't conflate them.

if is_struct_zst {
    // PhantomData<T> → construct empty struct with correct type
    emit_construct_struct(phantom_data_type, fields: [])
} else {
    // () → construct empty tuple
    emit_construct_tuple(fields: [])
}

This is a general lesson about compiler IRs: two types can be semantically identical (zero size, zero fields) but structurally different. And if your IR cares about structural types — which most do — you have to preserve that distinction.


The LLVM Problem: "Empty Parameter Types Are Not Supported"

Now here's the second problem. Even if you handle ZSTs perfectly in your own IR, you eventually have to lower to LLVM. And LLVM doesn't like empty structs in certain positions.

Let me show you what rustc's own LLVM output looks like for our examples:

Function MIR params → return LLVM params → return What changed
create_iter (&[u32]) → Iter { ptr, end, PhantomData } (ptr, i64) → { ptr, ptr } PhantomData field gone
create_multi () → Multi { u64, PhantomData, PhantomData } () → i64 only the u64 remains
accept_phantom (PhantomData<u32>) → PhantomData<u32> () → void entire function erased
pass_unit_struct (Locked) → Locked () → void same erasure
Resource::lock (Resource<Unlocked>) → Resource<Locked> (i64) → i64 just the u64 data
create_row_matrix (ptr, usize, usize) → Matrix { ptr, rows, cols, PhantomData } (sret([24 x i8]), ptr, i64, i64) → void 24 bytes, not 32
read_marker (&Iter) → PhantomData<&T> (ptr) → void return erased

Look at create_iter. MIR says the return type is Iter with three fields: ptr, end, PhantomData. LLVM IR says the return type is { ptr, ptr } — two fields. PhantomData is stripped.

create_multi is even more dramatic. MIR says Multi has three fields. LLVM says... i64. Just the data. Both PhantomData fields gone.

And accept_phantom — a function that takes and returns PhantomData — becomes void () in LLVM. The entire function compiles to: do nothing, return nothing.

But here's the thing: this stripping happens inside rustc's own codegen. When you're building a custom MIR backend, you're bypassing rustc's codegen. You get the MIR — with all the PhantomData intact — and you have to do the stripping yourself.

And some LLVM backends are particularly strict about this. If you emit an empty struct {} as a function parameter, certain backends will reject it outright:

LLVM ERROR: Empty parameter types are not supported

So you must strip ZSTs before emitting LLVM IR. You can't just pass them through.


The Two-Layer Architecture

So here's the design that falls out of all this. You want two layers:

┌───────────────────────────────────────────────────────┐
│                  Your MIR Dialect                     │
│  • PhantomData is a real type with a real name        │
│  • Struct fields include ZST fields                   │
│  • Aggregate construction has ZST operands            │
│  • Type info available for analysis passes            │
├──────────────────────────────────────────────────────-┤
│                  Analysis Passes                      │
│  • Can query: "is field 2 a PhantomData?"             │
│  • Can query: "what's the marker type of this struct?"│
│  • Full type information for optimization decisions   │
├──────────────────────────────────────────────────────-┤
│                  LLVM Dialect                         │
│  • ZSTs stripped during type conversion               │
│  • Struct { ptr, PhantomData } → { ptr }              │
│  • fn(PhantomData) → fn()                             │
│  • Only runtime-relevant types survive                │
└──────────────────────────────────────────────────────-┘

In the high-level MIR layer, keep everything. PhantomData exists. Struct fields are complete. Type information is preserved. This is where you'd run any analysis that cares about what type something is — not just how many bytes it occupies.

In the low-level LLVM layer, strip ZSTs. Filter empty struct fields out of struct types. Drop ZST parameters from function signatures. Turn ZST-only return types into void.

The key insight is that stripping should happen at the type conversion boundary — when you're translating your MIR types to LLVM types. That's one location, it's clean, and it means your MIR-level passes always see the full picture.

This is actually what rustc does internally. Rustc's own codegen strips ZSTs when lowering MIR to LLVM IR. If you're building a custom backend, you're just recreating that same boundary.


Why This Matters Beyond PhantomData

I want to close with why this isn't just a PhantomData curiosity.

The typestate pattern — where you use ZSTs to encode state at the type level — is more common in Rust. Here's a real pattern:

struct Locked;
struct Unlocked;

struct Resource<State> {
    value: u64,
    _state: PhantomData<State>,
}

impl Resource<Unlocked> {
    fn lock(self) -> Resource<Locked> { ... }
}

impl Resource<Locked> {
    fn read(&self) -> u64 { ... }
    fn unlock(self) -> Resource<Unlocked> { ... }
}

The type system prevents you from calling read() on an unlocked resource. That's compile-time safety with zero runtime cost.

But in MIR, every lock() and unlock() constructs a new struct with a PhantomData operand. Every transition between states is visible as a PhantomData constant in the IR.

And if you look at the MIR for lock:

fn lock(_1: Resource<Unlocked>) -> Resource<Locked> {
    bb0: {
        _2 = copy (_1.0: u64);
        _0 = Resource::<Locked> { value: move _2, _state: const PhantomData::<Locked> };
        return;
    }
}

It copies the u64 out, constructs a new Resource<Locked> with a PhantomData operand. In LLVM IR, this becomes i64 (i64) — just copy the integer. But in MIR, the state transition is explicitly represented.

For anyone building analysis passes over MIR — say, checking that resources are always locked before use, or that state transitions follow a valid sequence — that PhantomData information is exactly what you'd want to inspect.

So: preserve ZST type info in your high-level IR. Strip it when lowering to LLVM. Two layers, clean boundary.


Outro

To summarize:

One — ZSTs are real operands in MIR. PhantomData appears in aggregate construction, field access, function signatures, and constant operands. It is not erased.

Two — Type precision matters. An empty struct and an empty tuple are both zero-sized, but they are different types. Conflating them breaks verification.

Three — LLVM backends can reject empty struct types in function signatures. You must strip ZSTs during LLVM lowering.

Four — The right architecture is two layers: MIR dialect preserves full type info for analysis; LLVM dialect strips ZSTs at the type conversion boundary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment