GPUDirect Storage Setup Guide (Linux, Proxmox, Virtualised Environments)

Purpose

This guide walks through enabling NVIDIA GPUDirect Storage (GDS) so a system can perform direct DMA transfers between NVMe devices and GPU VRAM.

The minimum success criteria:

nvidia_fs.ko loads successfully
gdscheck reports GDS operational
Benchmark tools such as gdsio run successfully

This guide is written for operators managing:

Proxmox or other hypervisors
Ubuntu-based guests (recommended baseline)
Environments where GPU and NVMe are passed through to a VM

Architecture Overview

GPUDirect Storage requires:

GPU accessible inside the OS
NVMe accessible as a raw block device
NVIDIA driver + CUDA installed
nvidia-fs kernel module loaded
Filesystem and mount options compatible with direct I/O

In virtualised environments, both GPU and NVMe must be PCI passthrough devices, not emulated disks.

Section 1 — Operator / Hypervisor Setup (Proxmox)

This section is the most important. Most failures originate here.

1.1 Enable IOMMU on the host

In BIOS:

Enable VT-d / AMD-Vi
Enable SR-IOV if available
Enable Above 4G decoding

On the host kernel:

For Intel:

intel_iommu=on iommu=pt

For AMD:

amd_iommu=on iommu=pt

Reboot after applying changes.

1.2 Identify GPUs and NVMe devices

On the Proxmox host:

lspci | grep -E "NVIDIA|Non-Volatile"

Record PCI addresses such as:

0000:65:00.0 GPU
0000:5e:00.0 NVMe

1.3 Map NVMe devices to GPUs (PCIe topology)

GPUDirect Storage performs best when each GPU and its NVMe device share:

the same PCIe root complex, or
the same PCIe switch

View the PCIe tree

lspci -tv

You want GPU and NVMe to appear under the same upstream bridge where possible.

Example of a good layout:

-[0000:5d]-+-00.0 PCI bridge
           +-00.1 NVIDIA GPU
           +-00.2 NVMe controller

Avoid layouts where:

GPU is under one root complex
NVMe is under another CPU socket

Those paths cross inter-socket links and reduce performance.

1.4 Validate GPU topology from NVIDIA

If the GPU driver is loaded on the host:

nvidia-smi topo -m

Key indicators:

PIX = same PCIe switch (ideal)
PXB = multiple switches (acceptable)
SYS = across CPU sockets (not ideal)

Aim for PIX or PXB relationships between GPU and NVMe.

Example:

root@greenthread-h100:~# nvidia-smi topo -m
	GPU0	GPU1	GPU2	GPU3	GPU4	GPU5	GPU6	GPU7	NIC0	CPU Affinity	NUMA Affinity	GPU NUMA ID
GPU0	 X 	NV12	PHB	PHB	PHB	PHB	PHB	PHB	PHB	0-251	0		N/A
GPU1	NV12	 X 	PHB	PHB	PHB	PHB	PHB	PHB	PHB	0-251	0		N/A
GPU2	PHB	PHB	 X 	NV12	PHB	PHB	PHB	PHB	PHB	0-251	0		N/A
GPU3	PHB	PHB	NV12	 X 	PHB	PHB	PHB	PHB	PHB	0-251	0		N/A
GPU4	PHB	PHB	PHB	PHB	 X 	NV12	PHB	PHB	PHB	0-251	0		N/A
GPU5	PHB	PHB	PHB	PHB	NV12	 X 	PHB	PHB	PHB	0-251	0		N/A
GPU6	PHB	PHB	PHB	PHB	PHB	PHB	 X 	NV12	PHB	0-251	0		N/A
GPU7	PHB	PHB	PHB	PHB	PHB	PHB	NV12	 X 	PHB	0-251	0		N/A
NIC0	PHB	PHB	PHB	PHB	PHB	PHB	PHB	PHB	 X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

1.5 Ensure proper IOMMU grouping

Check IOMMU groups:

find /sys/kernel/iommu_groups/ -type l

Requirements:

GPU functions grouped correctly
NVMe controller isolated or safely passthrough-able

If devices share groups with critical host devices, motherboard slot placement may need adjustment.

1.6 Configure passthrough to the VM

For each VM:

Pass through:

GPU
GPU audio function (if present)
NVMe controller (not a virtual disk)

Recommended VM settings:

Machine type: q35
CPU type: host
PCIe enabled
Ballooning disabled (for benchmarking)
Hugepages optional but recommended

Section 2 — Guest OS Setup (Ubuntu Recommended)

Ubuntu 22.04 or 24.04 is a reliable baseline.

After booting the VM, confirm hardware visibility:

nvidia-smi
lsblk
lspci

Both GPU and NVMe should be visible.

2.1 Install NVIDIA Driver

Example:

apt update
apt install nvidia-driver-590

Reboot and verify:

nvidia-smi

2.2 Install CUDA

Example:

apt install cuda

Confirm:

nvcc --version

2.3 Install GPUDirect Storage packages

If available via repository:

apt install nvidia-fs gds-tools

Load module:

modprobe nvidia-fs

Verify:

lsmod | grep nvidia_fs

2.4 Building nvidia-fs manually (common with custom kernels)

If packages are unavailable or kernel mismatched:

Install prerequisites:

apt install build-essential linux-headers-$(uname -r)

Build:

git clone https://github.com/NVIDIA/gds-nvidia-fs.git
cd gds-nvidia-fs/src
make
insmod nvidia-fs.ko

Verify:

lsmod | grep nvidia_fs

Section 3 — Filesystem and Storage Requirements

Recommended:

Local NVMe device
EXT4 or XFS
Mounted normally (no network filesystem during initial testing)

Create test mount:

mkfs.ext4 /dev/nvme0n1
mount /dev/nvme0n1 /mnt/nvme

Section 4 — Verification

4.1 Run GDS validation

gdscheck -v

Expected indicators:

nvidia_fs loaded
GPU detected
Compatible filesystem detected

4.2 Run a benchmark

Example:

gdsio -f /mnt/nvme/testfile -d 0 -s 1G

This performs direct storage to GPU transfers.

Section 5 — Performance Validation

To confirm topology is correct:

nvidia-smi topo -m

Ensure:

GPU ↔ NVMe path is PIX or PXB
Not SYS if possible

If SYS appears, consider:

Moving NVMe to a different slot
Moving GPU to a different slot
Using a PCIe switch backplane

Section 6 — Common Failure Modes

nvidia_fs will not load

Check:

dmesg | grep nvidia

Typical causes:

Kernel headers missing
Driver mismatch
Secure Boot enabled

gdscheck reports compatibility mode

Possible causes:

NVMe not passthrough
Unsupported filesystem
Module not loaded

Poor performance

Common causes:

GPU and NVMe on different NUMA nodes
Virtual disk instead of raw NVMe
Incorrect PCIe slot topology

Section 7 — Recommended Baseline Configuration

A configuration that consistently works:

Host:

Proxmox VE 8.x
IOMMU enabled
GPU and NVMe on same PCIe root complex

Guest:

Ubuntu 22.04 or 24.04
NVIDIA driver 535–590
CUDA 12.x
nvidia-fs loaded

Storage:

Local NVMe
EXT4

Quick Operator Checklist

Hypervisor:

IOMMU enabled
GPU passthrough working
NVMe passthrough working
GPU and NVMe share PCIe path

Guest:

NVIDIA driver installed
CUDA installed
nvidia_fs module loaded
gdscheck passes

Benchmark:

gdsio runs successfully
gt-benchy runs with GDS support

17twenty/gds-setup.md

Select an option

No results found

Select an option

No results found

GPUDirect Storage Setup Guide (Linux, Proxmox, Virtualised Environments)

Purpose

Architecture Overview

Section 1 — Operator / Hypervisor Setup (Proxmox)

1.1 Enable IOMMU on the host

1.2 Identify GPUs and NVMe devices

1.3 Map NVMe devices to GPUs (PCIe topology)

View the PCIe tree

1.4 Validate GPU topology from NVIDIA

Example:

1.5 Ensure proper IOMMU grouping

1.6 Configure passthrough to the VM

Section 2 — Guest OS Setup (Ubuntu Recommended)

2.1 Install NVIDIA Driver

2.2 Install CUDA

2.3 Install GPUDirect Storage packages

2.4 Building nvidia-fs manually (common with custom kernels)

Section 3 — Filesystem and Storage Requirements

Section 4 — Verification

4.1 Run GDS validation

4.2 Run a benchmark

Section 5 — Performance Validation

Section 6 — Common Failure Modes

nvidia_fs will not load

gdscheck reports compatibility mode

Poor performance

Section 7 — Recommended Baseline Configuration

Quick Operator Checklist