The Nova driver is a two-tier GPU driver architecture written in Rust for NVIDIA GPUs. It consists of two main components that work together:
This is the low-level hardware abstraction layer that directly interfaces with the GPU hardware.
Entry Point:
kernel::module_pci_driver! {
type: driver::NovaCore,
name: "NovaCore",
author: "Danilo Krummrich",
description: "Nova Core GPU driver",
license: "GPL v2",
firmware: [],
}
Key Components:
- PCI Driver (
driver.rs
): Implements the PCI device driver interface - GPU Hardware Abstraction (
gpu.rs
): Manages GPU chipset detection, architecture identification - Register Management (
regs/
): Hardware register definitions and access macros - Firmware Management (
firmware.rs
): Handles GPU (or GSP) firmware loading and management- GSP stands for GPU System Processor.
- A GSP-based driver is one where the GPU runs part of its own driver stack, on internal firmware, and the host-side driver becomes a thin client that just sends commands to it.
- The host-side driver (like Nova-Core) becomes simpler. It talks to the GSP via a command queue and mailbox interface.
-
Things like:
- Power management
- Microcode loading
- Command submission handling
- Low-level hardware setup
- Error reporting and logging
This is the DRM (Direct Rendering Manager) layer that provides the userspace API.
Entry Point:
kernel::module_auxiliary_driver! {
type: NovaDriver,
name: "Nova",
author: "Danilo Krummrich",
description: "Nova GPU driver",
license: "GPL v2",
}

- NovaCore Struct - Main PCI driver structure:
#[pin_data]
pub(crate) struct NovaCore {
#[pin]
pub(crate) gpu: Gpu,
_reg: auxiliary::Registration,
}
- GPU Hardware Abstraction - Manages chipset detection:
pub(crate) struct Gpu {
spec: Spec,
/// MMIO mapping of PCI BAR 0
bar: Devres<Bar0>,
fw: Firmware,
}
- Chipset Support - Supports multiple GPU generations:
- Turing (TU102, TU104, TU106, TU117, TU116)
- Ampere (GA100, GA102, GA103, GA104, GA106, GA107)
- Ada (AD102, AD103, AD104, AD106, AD107)
- DRM Driver Structure:
pub(crate) struct NovaDriver {
#[expect(unused)]
drm: ARef<drm::Device<Self>>,
}
- DRM IOCTLs - Userspace API:
kernel::declare_drm_ioctls! {
(NOVA_GETPARAM, drm_nova_getparam, ioctl::RENDER_ALLOW, File::get_param),
(NOVA_GEM_CREATE, drm_nova_gem_create, ioctl::AUTH | ioctl::RENDER_ALLOW, File::gem_create),
(NOVA_GEM_INFO, drm_nova_gem_info, ioctl::AUTH | ioctl::RENDER_ALLOW, File::gem_info),
}
The two drivers communicate via the Linux Auxiliary Bus:
- Nova-Core registers an auxiliary device named
"nova-drm"
- Nova-DRM binds to this auxiliary device as an auxiliary driver
- This creates a clean separation between hardware management and DRM functionality
- PCI Probe: Nova-Core detects NVIDIA GPU via PCI device table
- Hardware Init: Maps PCI BAR0, detects chipset, loads firmware
- Auxiliary Registration: Creates auxiliary device for DRM layer
- DRM Binding: Nova-DRM binds to auxiliary device
- DRM Registration: Registers DRM device with userspace API
This architecture provides a separation of concerns where Nova-Core handles all hardware-specific operations while Nova-DRM focuses purely on providing the standard Linux graphics API to userspace applications.
The first few lines of the macro expansion of kernel::module_pci_driver!
is pure registration code.
type Ops<T> = ::kernel::pci::Adapter<T>;
struct DriverModule {
_driver: ::kernel::driver::Registration<Ops<driver::NovaCore>>,
}
...
...
Key components:
DriverModule
- Container that holds the driver registration object.Registration<Ops<driver::NovaCore>>
- The actual registration objectOps<T> = ::kernel::pci::Adapter<T>
- PCI-specific adapter that bridges the driver to kernel's PCI subsystem
When this module loads, the InPlaceModule::init()
function:
- Creates registration object:
Registration::new(name, module)
- Calls kernel PCI registration: Under the hood calls
__pci_register_driver()
- Registers callbacks: The driver's probe/remove functions with PCI subsystem
- Adds to driver list: Kernel now knows "NovaCore driver exists and handles these device IDs"
- No hardware interaction - doesn't touch any GPU yet
- No device initialization - that happens later in
probe()
- No memory mapping - that's probe's job
- No interrupts setup - also probe's responsibility
unsafe impl<T: Driver + 'static> driver::RegistrationOps for Adapter<T> {
type RegType = bindings::pci_driver;
//...
}
T = driver::NovaCore
(your actual driver implementation)Registration<T> = Registration<pci::Adapter<driver::NovaCore>>
pci::Adapter<driver::NovaCore>
implements RegistrationOpsRegType = bindings::pci_driver
(the C struct pci_driver)
So T
in Registration<T>
is pci::Adapter<driver::NovaCore>
- it's the driver wrapped in a PCI-specific adapter that knows how to talk to the kernel's PCI subsystem.
The adapter pattern allows the same Registration<T>
type to work with different bus types (PCI, platform, USB, etc.) by just changing the adapter implementation.

This is essentially the driver saying to the kernel - "I exist and here's how to contact me when you find my hardware."
The actual work happens later when probe()
gets called for each matching device found on the PCI bus.

The pin-related code is Rust's solution to a fundamental kernel programming problem: safe initialization of self-referential or hardware-tied structures.
In kernel space, you often have structures that:
- Contain pointers to themselves or their fields
- Are tied to specific memory locations (like hardware mappings)
- Must never be moved once initialized
- Need complex, fallible initialization sequences
Traditional Rust initialization doesn't handle these cases well because:
- Rust normally allows moving values freely
- Standard constructors can't handle complex initialization dependencies
- Partial initialization failures need careful cleanup
The generated __ThePinData
struct and its methods provide:
- Pinned initialization: Objects are initialized in-place and can never be moved
- Field-by-field initialization: Each field gets its own initialization method
- Failure handling: If initialization fails partway through, already-initialized fields are properly cleaned up
- Safety guarantees: The type system prevents use-after-move or double-initialization
The __Unpin
struct and trait implementations ensure that your driver struct can only be used in pinned contexts, preventing accidental moves.
The driver implementation is in drivers/gpu/nova_core/driver.rs
, contains the PCI device table and probe function:
const PCI_TABLE: ... = // Maps NVIDIA vendor ID to your driver
impl pci::Driver for NovaCore {
fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> Result<Pin<KBox<Self>>> {
// Initialize hardware
// Use pin-init to safely construct the driver struct
}
}
The probe function:
- Enables the PCI device and sets it as bus master
- Maps BAR0 (the first memory region) for hardware access
- Uses pin-initialization to construct the driver struct safely
- Returns a pinned, heap-allocated driver instance
The probe function uses KBox::pin_init()
because:
- The
Gpu
object likely contains hardware mappings that can't be moved - The initialization might fail at various steps (hardware access, memory allocation)
- The kernel needs the driver instance to live in a stable memory location
- Cleanup must happen automatically if initialization fails
The complex closure-based initialization ensures that if GPU initialization succeeds but auxiliary registration fails, the GPU object is properly cleaned up.
This macro transforms your simple driver declaration into a full kernel PCI driver with:
- Memory-safe hardware initialization via pin-init
- Automatic PCI device table generation
- Proper error handling and cleanup
- Type-safe driver registration
The pin-related complexity is Rust's way of providing the same safety guarantees in kernel space that you get in userspace, even when dealing with hardware resources that have strict lifetime and location requirements.
- Host driver has full visibility into:
- GPU registers
- Memory layout
- Command buffer formats
- It builds fully-formed command buffers in system or VRAM memory.
- It writes to MMIO registers (e.g., “doorbell”) to notify the GPU to execute them.
The host does not build full command buffers anymore. Instead, it acts like a client making requests.
Typically, structured high-level messages like:
- “Submit Job” Requests
- A struct describing:
- Type of job (graphics, compute, copy)
- Pointers to user command buffers (built by Vulkan/CUDA/ML stack)
- Sync primitives (fences, semaphores)
- Context handle / engine ID
- These go into shared memory (e.g. a ring buffer in VRAM or host memory)
- The host writes to a mailbox register to say “GSP, there’s a new job”
- A struct describing:
-
Memory Management Commands
- “Allocate buffer”
- “Map memory for DMA”
- “Pin VRAM”
- These commands pass info like buffer sizes, flags, and physical addresses
-
Context/Channel Setup
- GSP might receive a command like:
- “Create a new graphics context for PID X”
- “Set up compute ring buffer with base address Y”
- These involve initializing per-process state
-
Sync Commands
- “Signal fence when done with job”
- “Wait for semaphore before starting job”
- Often passed as handles or addresses to memory-backed sync objects
-
Error Handling / Heartbeat
- GSP may periodically get:
- “Query fault status”
- “Ping”
- “Get log buffer pointer”
Even though userspace (e.g. Vulkan, CUDA, ML frameworks) generates low-level commands, the GSP is now the one that validates, schedules, and submits them to hardware.
- Host just acts as a forwarder + memory manager
- GSP is a gatekeeper that sees raw command buffers and determines what’s allowed to run
- Parse or validate command buffers
- Schedule jobs
- Directly program graphics engines
- Handle hardware interrupts (GSP does first-level fault management)
// install rust with rustup
// get rustc nightly as well
// make menuconfig - enable the rust option
// we may need to set RUST_LIB_SRC to use rustup's rust-srcs
// make rustavailable
// make rust/
// make rust-analyzer
// make drivers/gpu/nova_core.rsi
# 1. Define the command (with 'cmd_' prefix)
quiet_cmd_rustc_library = ...
cmd_rustc_library = ...
# 2. Define the rule (with 'rule_' prefix)
define rule_rustc_library
$(call cmd_and_fixdep,rustc_library) ← just 'rustc_library'
endef
# 3. Use the rule (with just the base name)
$(obj)/core.o: deps FORCE
+$(call if_changed_rule,rustc_library) ← just 'rustc_library'
The reason Unpin
is not automatically implemented for __Unpin
—even though its fields contain values (not references or pins)—comes down to how the Rust compiler determines Unpin
and the purpose of the generated struct in this context.
In Rust, a struct is automatically Unpin
if:
- All its type parameters are
Unpin
(and it does not explicitly opt out) - All its fields are
Unpin
- If you don't implement
Drop
or (depending on context) do not add certain marker types
However, there are exceptions and "brake levers" used by code generation tools and macros:
- If a struct has certain types of
PhantomData
—for example,PhantomData<fn(&T) -> &T>
—that are meant to represent "pinned-ness", these will prevent automatic implementation ofUnpin
for the containing struct.
Look at this field from your generated struct:
__phantom_pin: PhantomData<fn(&'__pin ()) -> &'__pin ()>,
Fields of this kind act as a "pinning marker," telling the compiler: do not auto-implement Unpin
for this struct, because it is meant to participate in pinning APIs and requires special treatment.
The use of this pattern is a known Rust idiom (see crates like pin-project
and pin-init
). It's a way to make sure that structs involved in pin/projection/macros CANNOT ever "silently" become Unpin
through automatic derivation.
- The Rust compiler detects the use of such pinning marker fields and will not generate
Unpin
for the struct, even if all the other fields are "plain data" types. - This means that only you (or macro-generated code) could opt in to
Unpin
—and, by default, it’s prevented.
- The presence of "pin marker" fields, like
PhantomData<fn(&T) -> &T>
, blocks Rust's default implementation ofUnpin
, even if the data fields could be moved. - This is intentional and central to ensuring soundness for pin-projected or pin-initialized structs—so you (or the macro) maintain full control.
- As a result, unless you or the macro explicitly
impl Unpin for __Unpin {}
(which isn't done here), the struct will not beUnpin
, and neither will your driver struct.
This is one of the "magic tricks" of Rust's type system to enforce safety at compile time, especially for low-level scenarios like writing kernel drivers.
data.gpu(&raw mut (*slot).gpu, init)
This line is right at the heart of the pin-initialization system being used in the Rust Linux kernel driver framework, specifically within the closure passed to KBox::pin_init
.
data
here is an instance of the autogenerated struct__ThePinData
.slot
is a raw pointer (*mut NovaCore
) — this is the memory location where the driver struct is being constructed.init
is an initializer for theGpu
field, for example returned byGpu::new(...)
.
data.gpu(&raw mut (*slot).gpu, init)
Let’s take apart each piece:
Expression | Meaning |
---|---|
(*slot).gpu |
Dereferences the pointer slot (which points to uninitialized NovaCore ) and accesses its gpu field. |
&raw mut |
This is how you take a raw mutable pointer to that field. This is needed because the memory is not yet fully initialized — normal references (&mut ) would violate safety rules. |
data.gpu(...) |
Calls the method gpu(...) on __ThePinData , which is auto-generated by #[pin_data] macros and part of the pin-init framework. |
init |
A value or closure that knows how to initialize the Gpu type and satisfies the trait PinInit<Gpu, E> — i.e., it's an initializer that can return a result and ensures that the value is constructed in-place. |
So this line is saying:
“Initialize the uninitialized
Gpu
field of theNovaCore
struct in-place at this raw pointer location usinginit
, and track the pinned-ness correctly.”
unsafe {
data.gpu(&raw mut (*slot).gpu, init)?
}
- This is a safe interface to an unsafe operation — in-place, possibly fallible construction of a subfield of the struct.
- If
init
fails (returnsErr
), the entire driver struct construction fails safely. Already-initialized fields will be correctly dropped. - If it succeeds, the
Gpu
is initialized and can be safely used later, still in its pinned location.
Because:
Gpu
likely cannot be moved (e.g., it may hold memory mappings or self-referential data).- You’re working with partially initialized memory.
- This system ensures proper drop semantics, pin guarantees, and memory safety even in kernel-space.
The init
that's passed must implement:
trait PinInit<T, E> {
fn __pinned_init(self, ptr: *mut T) -> Result<(), E>;
}
So that data.gpu(ptr, init)
can safely call __pinned_init(init, ptr)
under the hood. This is part of the low-level pin-init framework.
So, this line:
data.gpu(&raw mut (*slot).gpu, init)
...is doing all of this:
- Calling into the low-level initialization machinery for the
Gpu
field. - Ensuring the field is initialized in-place at a pinned memory location.
- Handling potential failure in a safe, tracked way.
- Participating in a macro-generated framework that ensures lifetime, pin-safety, drop-safety, and error handling — perfectly suited for kernel drivers.
The bindgen command that generates Rust FFI bindings from C headers for the Linux kernel. Let me break down what each part does:
Core Command Structure
bindgen rust/bindings/bindings_helper.h [bindgen_options] -- [clang_options]
The --
separates bindgen-specific options from clang compiler options that bindgen passes to clang for parsing the C headers.
--blocklist-type __kernel_s?size_t --blocklist-type __kernel_ptrdiff_t
- Blocklists certain C types from being generated as Rust bindings
- These types have special handling or conflicts
--opaque-type xregs_state --opaque-type desc_struct --opaque-type arch_lbr_state
- Makes these types opaque in Rust (you can't see their fields)
- Used for complex kernel structures that shouldn't be directly accessed from Rust
--blocklist-function __list_.*_report
- Blocks functions matching this regex pattern
- Prevents problematic internal kernel functions from being exposed
--rust-target 1.68 --use-core --with-derive-default --ctypes-prefix ffi
--rust-target 1.68
: Target Rust 1.68 compatibility--use-core
: Usecore::
instead ofstd::
(for no_std environment)--with-derive-default
: Auto-generateDefault
implementations--ctypes-prefix ffi
: Prefix C types withffi::
These are standard clang compiler flags that bindgen passes to clang for parsing:
-I./arch/arm64/include -I./include -I./arch/arm64/include/uapi
- Tell clang where to find header files
- ARM64-specific and general kernel includes
-include ./include/linux/compiler-version.h -include ./include/linux/kconfig.h
- Force-include essential kernel headers before parsing
- Sets up kernel compilation environment
--target=aarch64-linux-gnu -mlittle-endian
- Target ARM64 architecture
- Little-endian byte order
-D__KERNEL__ -D__BINDGEN__ -DMODULE
__KERNEL__
: Tells headers we're in kernel space__BINDGEN__
: Special flag for bindgen parsingDMODULE
: Building as a kernel module
-O2 -fstack-protector-strong -Wall -Wextra
- Standard kernel compilation flags
- Security and optimization settings
sed -Ei 's/pub const RUST_CONST_HELPER_([a-zA-Z0-9_]*)/pub const \1/g'
- Cleans up generated constant names
- Removes
RUST_CONST_HELPER_
prefix from constants
The kernel environment is extremely specific:
- No Standard Library - Must use
core::
only - Architecture-Specific - ARM64 in this case
- Kernel Headers - Complex macro systems and conditional compilation
- Safety - Many kernel structures shouldn't be directly accessible
- ABI Compatibility - Must match exact kernel compilation flags
This produces rust/bindings/bindings_generated.rs
containing:
// Generated bindings like:
pub const EINVAL: i32 = 22;
pub const GFP_KERNEL: u32 = 0x400;
#[repr(C)]
pub struct pci_device { /* ... */ }
extern "C" {
pub fn printk(fmt: *const i8, ...) -> i32;
}
This allows Nova driver code to safely call kernel functions and use kernel types from Rust while maintaining memory safety and ABI compatibility.
The complexity ensures the generated bindings exactly match what the C kernel expects, which is critical for a kernel driver that must integrate seamlessly with the existing C kernel infrastructure.
Bindgen always uses clang to parse C headers - this is not specific to the Linux kernel at all. This is how bindgen works for any C library or codebase.
C Header Files → Clang Parser → AST → Bindgen → Rust FFI Bindings
- Clang Frontend: Bindgen uses clang's C parser to understand the C code
- AST Generation: Clang creates an Abstract Syntax Tree of the C declarations
- Bindgen Processing: Bindgen walks the AST and generates corresponding Rust code
- Output: Safe Rust FFI bindings
Bindgen chose clang because:
- Excellent C/C++ parsing - Industry-standard parser
- Stable API - libclang provides a stable C API for tooling
- Complete C support - Handles complex macros, preprocessor directives, etc.
- Cross-platform - Works consistently across different systems
bindgen sqlite3.h -o sqlite3_bindings.rs
bindgen openssl/ssl.h \
--allowlist-function "SSL_.*" \
--allowlist-type "SSL.*" \
-o openssl_bindings.rs \
-- -I/usr/include/openssl
bindgen /usr/include/stdio.h \
--use-core \
--ctypes-prefix libc \
-o stdio_bindings.rs
The complexity in the kernel bindgen command is due to the kernel's unique requirements, not bindgen itself:
bindgen mylib.h -o bindings.rs -- -I/usr/include
bindgen bindings_helper.h \
--blocklist-type __kernel_size_t \
--opaque-type spinlock \
--use-core \
--no-layout-tests \
-o bindings.rs \
-- -D__KERNEL__ -nostdinc -I./include [many more flags...]
Aspect | Regular Library | Linux Kernel |
---|---|---|
Standard Library | Can use std:: |
Must use core:: only |
Headers | Standard system headers | Custom kernel headers |
Macros | Simple | Complex conditional compilation |
Types | Standard C types | Kernel-specific types |
Environment | Userspace | Kernel space |
Safety | Less critical | Critical (kernel crashes = system crashes) |
// Simplified bindgen internals
fn generate_bindings(header: &Path, clang_args: &[String]) -> Result<String> {
// 1. Use clang to parse C headers
let index = clang::Index::new();
let tu = index.parser(header)
.arguments(clang_args) // <- Your clang flags go here
.parse()?;
// 2. Walk the AST and generate Rust
let mut bindings = String::new();
for entity in tu.get_entity().get_children() {
match entity.get_kind() {
EntityKind::FunctionDecl => generate_function(&mut bindings, entity),
EntityKind::StructDecl => generate_struct(&mut bindings, entity),
// ... etc
}
}
Ok(bindings)
}
bindgen vulkan/vulkan.h \
--allowlist-function "vk.*" \
--allowlist-type "Vk.*" \
--default-enum-style rust \
-o vulkan_bindings.rs
- Bindgen always uses clang - this is its fundamental architecture
- Works with any C library - not kernel-specific
- Kernel complexity comes from kernel's unique environment, not bindgen
- Clang flags tell clang how to parse the specific C codebase
- Bindgen flags control how the Rust bindings are generated
The Linux kernel just happens to be one of the most complex C codebases to generate bindings for, which is why you see so many flags!
Looking at the expansion, you can see:
#[allow(unreachable_code, clippy::diverging_sub_expression)]
let _ = || {
unsafe {
::core::ptr::write(
slot,
Self {
gpu: { panic_cold_explicit(); }, // ← These panic!
_reg: { panic_cold_explicit(); },
},
);
};
};
This closure is never called - it's purely a compile-time check! Here's why it exists:
The macro needs to ensure that:
- Every field is mentioned exactly once in the
try_pin_init!
block - No fields are forgotten
- No fields are mentioned twice
The closure with panic!()
expressions serves as a type checker trick:
- The compiler will verify that
Self { gpu: ..., _reg: ... }
is a valid struct construction - If you forget a field, you get a compile error: "missing field
xyz
" - If you mention a field twice in the macro, you get a compile error
- The
panic!()
expressions are just placeholders - the compiler only cares about the field names and types
The actual field initialization happens before this unreachable closure, in code like:
// Real initialization (this actually runs):
let init = Gpu::new(pdev, bar)?;
unsafe { data.gpu(::core::ptr::addr_of_mut!((*slot).gpu), init)? };
let _reg = auxiliary::Registration::new(...)?;
unsafe { ::core::ptr::write(::core::ptr::addr_of_mut!((*slot)._reg, _reg) };
// Then the unreachable type-check closure (never runs):
let _ = || { /* panic stuff */ };
It's a clever compile-time verification mechanism that ensures the macro user didn't make mistakes with field initialization, but the closure itself never executes.