- is a custom backend for
rustc
that compilesnative rust code
(albeit a sub-set of it) tospir-v
- MIR -> SPIR-V: to be precise, it takes in rust’s MIR and converts it to SPIR-V
Stuff that happens at each stage in the front-end
- AST: macro-expansion, name resolution
- HIR (High-level IR): type-checking, type-inference and trait solving
- THIR (High-level IR): pattern checking, exhaustiveness checking
- MIR (Mid-level IR): Simplified, control-flow-oriented representation. Closer to machine code than HIR.
- borrow checking, optimization (
ConstProp
,CopyProp
,dse
), monomorphization.
- borrow checking, optimization (
- Arenas: Efficient memory pools used for allocating compiler data structures (like HIR, AST nodes) with minimal overhead and no deallocation during compilation. Enables fast allocation and simplified lifetime management.
- Interning: Deduplicates and reuses identical values (like strings, types, symbols) via lookup tables to save memory and improve comparison speed.
- Query System: Demand-driven, memoized system where compiler data (like types, traits) is computed lazily as needed, ensuring incremental and modular compilation.
- traditional compilers have multiple fixed passes
rustc
is based around function-like queries that compute information- E.g.
type_of
- E.g.
- Queries are memoized: computed once and stored in a table
- Used by incremental compilation
- Query result table saved on disk, can be reused next time
- TyCtxt (Type Context): Central context object passed around compiler stages. Gives access to interning, arenas, queries, and type information.
- The Rust compiler team aimed to make it easier to support new codegen backends by introducing a backend-agnostic crate. At the time, the only backend was
rustc_codegen_llvm
, which was tightly coupled to LLVM-specific code.- Their solution was to separate shared logic from backend-specific details using generics and traits, enabling pluggable backends without code duplication or performance loss.
- This involved two key changes:
- Replacing all LLVM-specific types with generics in function signatures and struct definitions.
- Encapsulating all LLVM FFI calls within traits that define the interface between backend-agnostic and backend-specific code.
- The core LLVM structs—
CodegenCx
andBuilder
—remain backend-specific because their internals are too specialized to generalize. Each backend is expected to define its own versions of these types:CodegenCx:
manages code generation for a compilation unit.
Builder:
emits IR for individual basic blocks.
- These backend-defined types implement a set of common traits such as
CodegenBackend
,BuilderMethods
, and others. - rustc_codegen_ssa is this backend-agnostic crate. It defines a shared interface that all backends—LLVM, Cranelift, and GCC—can implement.
rustc_codegen_spirv
, for example, implements traits from this crate, includingCodegenBackend
andExtraBackendMethods
.
// list all executable binaries
cargo run --bin
// example
cargo run --bin example-runner-wgpu-builder
// dump-mir for example-runner-wgpu-builder
RUSTGPU_CODEGEN_ARGS="--dump-mir DIR" cargo run --bin example-runner-wgpu-builder
// we could also do this to dump-mir cfg with rustc
cargo rustc -- -Z unpretty=mir-cfg
or
// dump mir before and after every single pass
cargo rustc -- -Z dump-mir=main_vs
// tests
cargo compiletest hello_world
- If you go through the examples, you’ll see that they make use of a pre-compiled instance of
rustc_codegen_spirv
(stashed somewhere in the target directory), so you cant really breakpoint-debug the backend by providing it with some input. - The core flow of the shader construction process can be understood by examining
rust-gpu/examples/runners/wgpu/builder/src/main.rs
:- Instantiate a new
SpirvBuilder
with the desired target configuration. - Call its
build()
method, which internally invokesrustc
with the appropriate SPIR-V codegen backend and settings.- validates the target and prerequisites, like the crate path and SPIR-V environment.
- locates the rustc_codegen_spirv backend, either from the environment or Cargo’s search path.
- sets up all build flags, including RUSTFLAGS, LLVM args (not sure why we need them), and encoded panic/validation options.
- runs a nested Cargo build using the SPIR-V backend with the configured settings.
- ensures the final SPIR-V output is built with the correct metadata, features, and abort strategy.
- Instantiate a new
- finally lets take a look at this backend
- the first thing to notice is build script makes a patched copy of the
rust_codegen_ssa
crate to keep using the old, type-specific pointer allocations (called typed allocas)**** - alloca is an LLVM instruction that allocates memory on the stack.
- typed alloca means the allocated memory has a specific type, e.g.
- **old rust-gpu code expects typed allocas:** It relies on knowing the type of pointer at allocation time.
%my_var = alloca i32 ; knows it's an i32 store i32 42, i32* %my_var
- **new rust/llvm uses opaque pointers:** The type is not specified, which breaks the old code.
%my_var = alloca i8 ; just a pointer, type info is lost store i32 42, ptr %my_var
SpirvCodegenBackend::codegen_crate
: The ‘SpirvCodegenBackend’ implements CodegenBackend, which is how custom Rust codegen backends plug into the compiler.- which in-turn invokes the
rustc_codegen_ssa::base::codegen_crate
function. This function is backend-agnostic and orchestrates the full codegen pipeline:- early exit for metadata-only builds
- ==🟡collect & partition monomorphized items==
- force queries for codegen units
- metadata module emission (optional)
- ==🟡start async codegen coordination==
- allocator shim codegen (optional)
- sort cgus for throughput/memory tradeoff
- determine cgu reuse
- precompile first batch (parallel mode)
- ==🟣main compilation loop==
- ==finalize==
- Each
Code generation unit
is either:- compiled using the backend (
SpirvCodegenBackend::compile_codegen_unit
) - or reused from cache
- ==🟣all of this happens in the main compilation loop==
- compiled using the backend (
A Codegen Unit (CGU) is a collection of monomorphized items (functions, statics, etc.) that are grouped together to be compiled as a single unit by the backend.
- essentially a container for many functions/statics/constants. These are the things that actually get handed off to the backend in batches.
TLDR
- monomorphized item = One function/statics/const with concrete type substitutions (MIR exists here).
- codegen unit (CGU) = Group of such items compiled together as a chunk.
- you get MIR per item, but you compile batches of MIR together as CGUs.
- The compiler partitions items into Codegen Units, each of which might contain dozens or hundreds of items, depending on optimization level, inlining heuristics, LTO, and other factors.
SpirvCodegenBackend
:CodegenCx
:- The
CodegenCx
is created once per CGU:
let cx = CodegenCx::new(tcx, cgu);
- It’s the context used for all MIR-to-SPIR-V lowering for that one CGU. It holds:
- The SPIR-V module-in-construction and the spirv module builder - ==🟢BuilderSpirv==
- The symbol table (e.g., function handles, static variables),
- Backend-specific config like optimization level, dump flags,
- Possibly interning caches or string interner handles.
- The
BuilderSpirv:
- Primarily wraps an
rspirv::dr::Builder
inside aRefCell
to allow interior mutability. - which enables multiple runtime borrow-checked references to the underlying SPIR-V builder (
rspirv::Builder
), but only one mutable borrow at a time (enforced by RefCell at runtime). - When you want to emit SPIR-V code in the backend, you’ll typically:
- Construct a local
Builder<'a, 'tcx>
with a cursor and a reference toCodegenCx
- The local Builder uses
emit()
to push instructions torspirv::Builder
. - The local Builder struct is really a convenient wrapper for cursor management that knows:
- where in the module you want to emit
- which function/block you’re in
- Note: global statics or variables dont use the local
Builder
abstraction.
- Construct a local
- Primarily wraps an