Contents
Background and Prerequisites
Proxmox Host
LXC Container
- Compiling and installing the NVIDIA drivers
- Verifying everything is working
Docker container
- Installing NVIDIA Docker 2.0
  - Important for unprivileged containers:
- Updating docker-compose.yml
Upgrading

Background and Prerequisites

I run Plex via Docker inside an LXC container on top of Proxmox 8. I recently wanted to add a GPU and HW transcoding support, and I found several brilliant existing guides that helped a ton (thank you Joachim and Matthieu) but I wanted to expand on these, cover some issues that I encountered, and archive my steps.

My LXC container is unprivileged with nesting enabled, and I'm using an NVIDIA GTX 1050 Ti.

For this to work, the drivers need to be installed both on the Proxmox host and inside the LXC container, but only the host installation requires the kernel modules. It is very important that both the host and LXC container have the same version of the drivers installed, so we'll install them manually to avoid accidentally updating with apt.

Proxmox Host

The first thing I did was update Proxmox. This isn't required but if you don't have an enterprise subscription you'll probably need to add the no-subscription repo in order to get the PVE Headers so you may as well update at the same time.

Adding the No-Subscription repository

This is simple if the web GUI works for you - just select your node, go to Updates > Repositories and click the Add button. In the dropdown select "No-Subscription" and click add. While you're here, you can also disable the enterprise repository by clicking it and clicking the "Disable" button.

If you have the same issue that I had, where clicking "Add" does nothing, you can add it manually via the shell. This is documented here but the summary is run nano /etc/apt/sources.list and add this:

# PVE pve-no-subscription repository provided by proxmox.com,
# NOT recommended for production use
deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription

You should now be able to update (either from the shell using apt update and apt upgrade or through the Updates section of the GUI). This also fixed my "Add" button, woohoo!

Disabling the Nouveau kernel drivers

In order to install NVIDIA drivers we need to disable the Nouveau drivers first. This is documented in NVIDIAs installation guide but the quick version is:

echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist-nouveau.conf
update-initramfs -u
reboot

Compiling and installing the NVIDIA drivers

We need both the PVE headers and the build-essential package in order to compile the drivers.

apt install pve-headers-$(uname -r)
apt install build-essential

Next we need to download the NVIDIA driver and make it executable. I recommend using the latest production version which can be found here. If you use a different version you'll just need to my version number 550.144.03 with your version number. It's worth copying these commands to somewhere for later as we'll need them (with slight tweaks) to install the drivers inside the LXC container

wget -O NVIDIA-Linux-x86_64-550.144.03.run https://uk.download.nvidia.com/XFree86/Linux-x86_64/550.144.03/NVIDIA-Linux-x86_64-550.144.03.run
chmod +x NVIDIA-Linux-x86_64-550.144.03.run

Then we simply run the script. When asked if you want to install the 32-bit compatibility drivers select No, and again when asked if you'd like it to update your X configuration file select No.

./NVIDIA-Linux-x86_64-550.144.03.run

Updating udev rules

Usually the relevant device files (specifically the UVM devices) are created when something calls the GPU, but as we need to pass this through to the LXC container we need them to exist before the container is started. To ensure all the relevant device files are created on boot we will add some udev rules.

We need to add nvidia-drm and nvidia-uvm to /etc/modules-load.d/modules.conf

echo -e '\n# load nvidia modules\nnvidia-drm\nnvidia-uvm' >> /etc/modules-load.d/modules.conf

Then we need to add the below to /etc/udev/rules.d/70-nvidia.rules in order to create the relevant device files within /dev/ during boot.

KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"
SUBSYSTEM=="module", ACTION=="add", DEVPATH=="/module/nvidia", RUN+="/usr/bin/nvidia-modprobe -m"

To do this quickly, just run this:

echo -e 'KERNEL=="nvidia", RUN+="/bin/bash -c '\''/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'\''"\nKERNEL=="nvidia_uvm", RUN+="/bin/bash -c '\''/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'\''"\nSUBSYSTEM=="module", ACTION=="add", DEVPATH=="/module/nvidia", RUN+="/usr/bin/nvidia-modprobe -m"' >> /etc/udev/rules.d/70-nvidia.rules

Adding the NVIDIA persistence service

To prevent the driver/kernel module being unloaded when the GPU is not being used, NVIDIA provides a persistence service that we will install.

First we will copy the service to our working directory and extract it

cp /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 .
bunzip2 nvidia-persistenced-init.tar.bz2
tar -xf nvidia-persistenced-init.tar

Then we give it execute permissions, try to remove and previously installed versions of the service and then run the install script.

chmod +x nvidia-persistenced-init/install.sh
rm /etc/systemd/system/nvidia-persistenced.service
./nvidia-persistenced-init/install.sh

To verify it's installed successfully we can run:

systemctl status nvidia-persistenced.service

Then we just need to clean up after ourselves:

rm -rf nvidia-persistenced-init\*

Verifying everything is working

We should now be ready to reboot and check everything is working as expected.

After reboot, run nvidia-smi - you should expect an output like the below and your GPU should be detected:

root@proxmox:~# nvidia-smi
Tue Apr  3 23:11:20 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03   Driver Version: 550.144.03   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:03:00.0 Off |                  N/A |
| 30%   35C    P8    N/A /  75W |      4MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

You should also check that the required files have been created with ls -alh /dev/nvidia* - if everything is okay you should see an output like the below:

root@proxmox:~# ls -alh /dev/nvidia*
crw-rw-rw- 1 root root 195,   0 Apr  4 13:20 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr  4 13:20 /dev/nvidiactl
crw-rw-rw- 1 root root 195, 254 Apr  4 13:20 /dev/nvidia-modeset
crw-rw-rw- 1 root root 509,   0 Apr  4 13:20 /dev/nvidia-uvm
crw-rw-rw- 1 root root 509,   1 Apr  4 13:20 /dev/nvidia-uvm-tools

/dev/nvidia-caps:
total 0
drw-rw-rw-  2 root root     80 Apr  4 13:20 .
drwxr-xr-x 20 root root   5.1K Apr  4 13:20 ..
cr--------  1 root root 234, 1 Apr  4 13:20 nvidia-cap1
cr--r--r--  1 root root 234, 2 Apr  4 13:20 nvidia-cap2

If you're missing one of the top 5 files, something has gone wrong somewhere and the device files have not been created. I recommend you review the Updating udev rules step and verify everything has been configured correctly.

If everything looks good, take note of the fifth column numbers (the first numbers after the group name) as we'll need them in the next step - in my case they are 195 and 509.

Updating the LXC container configuration

To pass the devices through to our LXC container, we just need to add a few lines to the configuration file. The ID of my container is 105 but obviously yours will (probably) be different, so just substitute it with yours wherever relevant.

Shut down your container and edit its config file with nano /etc/pve/lxc/105.conf - we need to add the below lines to it. If your numbers from the previous step differ from mine you'll need to substitute them with yours.

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 509:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

If you have issues down the line, re-run ls -alh /dev/nvidia* and double check the numbers in column five. I've heard that the uvm devices can sometimes change between 509 and 511 seemingly at random, but I have not experienced this myself. To make sure it always works we can just add a third line to pass through the third number - if you pass through a device that doesn't exist LXC doesn't seem to care.

You should now be fine to start the container again.

LXC Container

We're now ready to install the NVIDIA driver inside the container.

Compiling and installing the NVIDIA drivers

This is much the same process as what we did for the host but we won't be installing the kernel module, so won't need the PVE Headers or build-essential package.

First we need to install the exact same version of the drivers and make it executable - if you saved your commands from earlier then you can redownload it but the easier way is to just push this file into our container.

pct push 105 NVIDIA-Linux-x86_64-550.144.03.run /root/NVIDIA-Linux-x86_64-550.144.03.run

Then we run the script, but this time with the --no-kernel-module argument. Again, answer No when asked if you want to install the 32-bit compatibility drivers or update your X configuration file.

./NVIDIA-Linux-x86_64-550.144.03.run --no-kernel-module

Verifying everything is working

That's it! You can now restart the container and again check everything is as expected similarly to how we did on the host.

root@docker ~# nvidia-smi
Tue Apr  3 23:36:17 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 550.144.03   Driver Version: 550.144.03   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:03:00.0 Off |                  N/A |
| 30%   36C    P8    N/A /  75W |      4MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

root@docker ~# ls -alh /dev/nvidia*
crw-rw-rw- 1 root root 195, 254 Apr  4 12:20 /dev/nvidia-modeset
crw-rw-rw- 1 root root 509,   0 Apr  4 12:20 /dev/nvidia-uvm
crw-rw-rw- 1 root root 509,   1 Apr  4 12:20 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Apr  4 12:20 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Apr  4 12:20 /dev/nvidiactl

/dev/nvidia-caps:
total 0
drwxr-xr-x 2 root root     80 Apr  4 13:36 .
drwxr-xr-x 9 root root    660 Apr  4 13:36 ..
cr-------- 1 root root 234, 1 Apr  4 13:36 nvidia-cap1
cr--r--r-- 1 root root 234, 2 Apr  4 13:36 nvidia-cap2

Docker container

Now our LXC container has access to the GPU we just need to tweak Docker so containers can gain access to the GPU too.

Installing NVIDIA Docker 2.0

First we need to add the repository in order to install the toolkit. Official docs can be found here and should probably be referred to, but these are the commands that I used:

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
  && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/ nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt update
apt install nvidia-container-toolkit

Important for unprivileged containers:

This is not necessary for privileged containers, but for unprivileged containers (like I'm running) you will need to nano /etc/nvidia-container-runtime/config.toml and change #no-cgroups = false to no-cgroups = true.

Then restart systemd and Docker:

systemctl daemon-reload
systemctl restart docker

Updating docker-compose.yml

The last step is to update our Plex docker-compose.yml.

version: '3.7'

services:
  plex:
    container_name: plex
    hostname: plex
    image: linuxserver/plex:latest
    restart: unless-stopped
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    environment:
      TZ: Europe/Paris
      PUID: 0
      PGID: 0
      VERSION: latest
      NVIDIA_VISIBLE_DEVICES: all
      NVIDIA_DRIVER_CAPABILITIES: compute,video,utility
    network_mode: host
    volumes:
      - /srv/config/plex:/config
      - /storage/media:/data/media
      - /storage/temp/plex/transcode:/transcode
      - /storage/temp/plex/tmp:/tmp

We can check that it's working in the dashboard; any active stream should have (hw) next to the transcoding information.

Upgrading

Whenever you upgrade the kernel you need to install the new PVE Headers and re-install the NVIDIA driver on the Proxmox host. If you want to run the same NVIDIA driver version, just re-run the original driver install. There should be no need to do anything in the LXC container (as the version stays the same, and no kernel modules are involved).

apt install pve-headers-$(uname -r)
./NVIDIA-Linux-x86_64-550.144.03.run
reboot

If you want to upgrade the NVIDIA driver, there are a few extra steps. If you already have a working NVIDIA driver (i.e. you did not just update the kernel), you have to uninstall the old NVIDIA driver first (else it will complain that the kernel module is loaded, and it will instantly load the module again if you attempt to unload it).

./NVIDIA-Linux-x86_64-550.144.03.run --uninstall
reboot

Then run through the process of Compiling and installing the NVIDIA drivers again with the new version.

Lastly, you must upgrade the driver in the LXC container to the same version (see here) and reboot.

MadDirtMonkey/Proxmox LXC Docker GPU Passthrough.md

Contents

Background and Prerequisites

Proxmox Host

Adding the No-Subscription repository

Disabling the Nouveau kernel drivers

Compiling and installing the NVIDIA drivers

Updating udev rules

Adding the NVIDIA persistence service

Verifying everything is working

Updating the LXC container configuration

LXC Container

Compiling and installing the NVIDIA drivers

Verifying everything is working

Docker container

Installing NVIDIA Docker 2.0

Important for unprivileged containers:

Updating docker-compose.yml

Upgrading