Skip to content

Instantly share code, notes, and snippets.

@17twenty
Last active February 18, 2026 02:08
Show Gist options
  • Select an option

  • Save 17twenty/e7dc7083cbe1bc80becec21249cc9e75 to your computer and use it in GitHub Desktop.

Select an option

Save 17twenty/e7dc7083cbe1bc80becec21249cc9e75 to your computer and use it in GitHub Desktop.

GPUDirect Storage Setup Guide (Linux, Proxmox, Virtualised Environments)

Purpose

This guide walks through enabling NVIDIA GPUDirect Storage (GDS) so a system can perform direct DMA transfers between NVMe devices and GPU VRAM.

The minimum success criteria:

  • nvidia_fs.ko loads successfully
  • gdscheck reports GDS operational
  • Benchmark tools such as gdsio run successfully

This guide is written for operators managing:

  • Proxmox or other hypervisors
  • Ubuntu-based guests (recommended baseline)
  • Environments where GPU and NVMe are passed through to a VM

Architecture Overview

GPUDirect Storage requires:

  1. GPU accessible inside the OS
  2. NVMe accessible as a raw block device
  3. NVIDIA driver + CUDA installed
  4. nvidia-fs kernel module loaded
  5. Filesystem and mount options compatible with direct I/O

In virtualised environments, both GPU and NVMe must be PCI passthrough devices, not emulated disks.


Section 1 — Operator / Hypervisor Setup (Proxmox)

This section is the most important. Most failures originate here.

1.1 Enable IOMMU on the host

In BIOS:

  • Enable VT-d / AMD-Vi
  • Enable SR-IOV if available
  • Enable Above 4G decoding

On the host kernel:

For Intel:

intel_iommu=on iommu=pt

For AMD:

amd_iommu=on iommu=pt

Reboot after applying changes.


1.2 Identify GPUs and NVMe devices

On the Proxmox host:

lspci | grep -E "NVIDIA|Non-Volatile"

Record PCI addresses such as:

0000:65:00.0 GPU
0000:5e:00.0 NVMe

1.3 Map NVMe devices to GPUs (PCIe topology)

GPUDirect Storage performs best when each GPU and its NVMe device share:

  • the same PCIe root complex, or
  • the same PCIe switch

View the PCIe tree

lspci -tv

You want GPU and NVMe to appear under the same upstream bridge where possible.

Example of a good layout:

-[0000:5d]-+-00.0 PCI bridge
           +-00.1 NVIDIA GPU
           +-00.2 NVMe controller

Avoid layouts where:

  • GPU is under one root complex
  • NVMe is under another CPU socket

Those paths cross inter-socket links and reduce performance.


1.4 Validate GPU topology from NVIDIA

If the GPU driver is loaded on the host:

nvidia-smi topo -m

Key indicators:

  • PIX = same PCIe switch (ideal)
  • PXB = multiple switches (acceptable)
  • SYS = across CPU sockets (not ideal)

Aim for PIX or PXB relationships between GPU and NVMe.

Example:

root@greenthread-h100:~# nvidia-smi topo -m
	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	NIC0	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NV12	PHB	PHB	PHB	PHB	PHB	PHB	PHB	0-251	0		N/A
GPU1	NV12	 X 	PHB	PHB	PHB	PHB	PHB	PHB	PHB	0-251	0		N/A
GPU2	PHB	PHB	 X 	NV12	PHB	PHB	PHB	PHB	PHB	0-251	0		N/A
GPU3	PHB	PHB	NV12	 X 	PHB	PHB	PHB	PHB	PHB	0-251	0		N/A
GPU4	PHB	PHB	PHB	PHB	 X 	NV12	PHB	PHB	PHB	0-251	0		N/A
GPU5	PHB	PHB	PHB	PHB	NV12	 X 	PHB	PHB	PHB	0-251	0		N/A
GPU6	PHB	PHB	PHB	PHB	PHB	PHB	 X 	NV12	PHB	0-251	0		N/A
GPU7	PHB	PHB	PHB	PHB	PHB	PHB	NV12	 X 	PHB	0-251	0		N/A
NIC0	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	 X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

1.5 Ensure proper IOMMU grouping

Check IOMMU groups:

find /sys/kernel/iommu_groups/ -type l

Requirements:

  • GPU functions grouped correctly
  • NVMe controller isolated or safely passthrough-able

If devices share groups with critical host devices, motherboard slot placement may need adjustment.


1.6 Configure passthrough to the VM

For each VM:

Pass through:

  • GPU
  • GPU audio function (if present)
  • NVMe controller (not a virtual disk)

Recommended VM settings:

  • Machine type: q35
  • CPU type: host
  • PCIe enabled
  • Ballooning disabled (for benchmarking)
  • Hugepages optional but recommended

Section 2 — Guest OS Setup (Ubuntu Recommended)

Ubuntu 22.04 or 24.04 is a reliable baseline.

After booting the VM, confirm hardware visibility:

nvidia-smi
lsblk
lspci

Both GPU and NVMe should be visible.


2.1 Install NVIDIA Driver

Example:

apt update
apt install nvidia-driver-590

Reboot and verify:

nvidia-smi

2.2 Install CUDA

Example:

apt install cuda

Confirm:

nvcc --version

2.3 Install GPUDirect Storage packages

If available via repository:

apt install nvidia-fs gds-tools

Load module:

modprobe nvidia-fs

Verify:

lsmod | grep nvidia_fs

2.4 Building nvidia-fs manually (common with custom kernels)

If packages are unavailable or kernel mismatched:

Install prerequisites:

apt install build-essential linux-headers-$(uname -r)

Build:

git clone https://github.com/NVIDIA/gds-nvidia-fs.git
cd gds-nvidia-fs/src
make
insmod nvidia-fs.ko

Verify:

lsmod | grep nvidia_fs

Section 3 — Filesystem and Storage Requirements

Recommended:

  • Local NVMe device
  • EXT4 or XFS
  • Mounted normally (no network filesystem during initial testing)

Create test mount:

mkfs.ext4 /dev/nvme0n1
mount /dev/nvme0n1 /mnt/nvme

Section 4 — Verification

4.1 Run GDS validation

gdscheck -v

Expected indicators:

  • nvidia_fs loaded
  • GPU detected
  • Compatible filesystem detected

4.2 Run a benchmark

Example:

gdsio -f /mnt/nvme/testfile -d 0 -s 1G

This performs direct storage to GPU transfers.


Section 5 — Performance Validation

To confirm topology is correct:

nvidia-smi topo -m

Ensure:

  • GPU ↔ NVMe path is PIX or PXB
  • Not SYS if possible

If SYS appears, consider:

  • Moving NVMe to a different slot
  • Moving GPU to a different slot
  • Using a PCIe switch backplane

Section 6 — Common Failure Modes

nvidia_fs will not load

Check:

dmesg | grep nvidia

Typical causes:

  • Kernel headers missing
  • Driver mismatch
  • Secure Boot enabled

gdscheck reports compatibility mode

Possible causes:

  • NVMe not passthrough
  • Unsupported filesystem
  • Module not loaded

Poor performance

Common causes:

  • GPU and NVMe on different NUMA nodes
  • Virtual disk instead of raw NVMe
  • Incorrect PCIe slot topology

Section 7 — Recommended Baseline Configuration

A configuration that consistently works:

Host:

  • Proxmox VE 8.x
  • IOMMU enabled
  • GPU and NVMe on same PCIe root complex

Guest:

  • Ubuntu 22.04 or 24.04
  • NVIDIA driver 535–590
  • CUDA 12.x
  • nvidia-fs loaded

Storage:

  • Local NVMe
  • EXT4

Quick Operator Checklist

Hypervisor:

  • IOMMU enabled
  • GPU passthrough working
  • NVMe passthrough working
  • GPU and NVMe share PCIe path

Guest:

  • NVIDIA driver installed
  • CUDA installed
  • nvidia_fs module loaded
  • gdscheck passes

Benchmark:

  • gdsio runs successfully
  • gt-benchy runs with GDS support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment