Skip to content

Instantly share code, notes, and snippets.

@morningreis
Created July 26, 2024 13:28
Show Gist options
  • Save morningreis/c917e7614aa34ee4b31931dfce0171de to your computer and use it in GitHub Desktop.
Save morningreis/c917e7614aa34ee4b31931dfce0171de to your computer and use it in GitHub Desktop.
Proxmox NVIDIA Setup

Note

Rebuild drivers with --dkms option if the Linux kernel is updated

[!NOTE]

Do not use Debian backports to install drivers. It does work, but the drivers are old, and it is significantly more work to remove the drivers.

https://docs.fileflows.com/guides/linux/proxmox-lxc-nvidia

Proxmox Host

echo -e "blacklist nouveau\noptions nouveau modeset=0" > /etc/modprobe.d/blacklist-nouveau.conf
update-initramfs -u
reboot

Install PVE Headers

apt install pve-headers-$(uname -r)

Download Latest Drivers

https://download.nvidia.com/XFree86/

wget -O NVIDIA-Linux-x86_64-525.60.13.run https://download.nvidia.com/XFree86/Linux-x86_64/525.60.13/NVIDIA-Linux-x86_64-525.60.13.run

chmod +x NVIDIA-Linux-x86_64-525.60.13.run

./NVIDIA-Linux-x86_64-525.60.13.run --check

./NVIDIA-Linux-x86_64-525.60.13.run

Say 'no' to 32-bit compatibility and to updating X-config

Add kernel module

echo -e '\n# load nvidia modules\nnvidia\nnvidia_uvm\nnvidia-drm\nnvidia-uvm' >> /etc/modules-load.d/modules.conf

Check cat /etc/modules-load.d/modules.conf. The following entries should be present:

GNU nano 5.4                                              /etc/modules-load.d/modules.conf *                                                     # /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.


# load nvidia modules
nvidia
nvidia_uvm
nvidia-drm
nvidia-uvm

Update Initramfs

update-initramfs -u -k all.

Note

This step may produce some warnings. They are unrelated to the driver install

Create rules to make relevant device at boot

nano /etc/udev/rules.d/70-nvidia.rules
KERNEL=="nvidia", RUN+="/bin/bash -c '/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*'"
KERNEL=="nvidia_uvm", RUN+="/bin/bash -c '/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*'"
SUBSYSTEM=="module", ACTION=="add", DEVPATH=="/module/nvidia", RUN+="/usr/bin/nvidia-modprobe -m"

Install Nvidia Persistent Service to avoid drivers being unloaded when not in use

cp /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2 .
bunzip2 nvidia-persistenced-init.tar.bz2
tar -xf nvidia-persistenced-init.tar

Remove old service, if it exists

rm /etc/systemd/system/nvidia-persistenced.service

Install service

chmod +x nvidia-persistenced-init/install.sh
./nvidia-persistenced-init/install.sh

Check that it is OK

systemctl status nvidia-persistenced.service
rm -rf nvidia-persistenced-init*

Reboot

Check status

nvidia-smi

systemctl status nvidia-persistenced.service

ls -alh /dev/nvidia*

Set up LXC Container

Shut down container. 195 and 508 come from ls -alh /dev/nvidia* and ls -alh /dev/dri*

nano /etc/pve/lxc/<vmid>.conf

lxc.cgroup2.devices.allow: c 195:* rwm
lxc.cgroup2.devices.allow: c 226:* rwm
lxc.cgroup2.devices.allow: c 237:* rwm
lxc.cgroup2.devices.allow: c 508:* rwm
lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

Install Drivers on container

wget -O NVIDIA-Linux-x86_64-525.60.13.run https://download.nvidia.com/XFree86/Linux-x86_64/525.60.13/NVIDIA-Linux-x86_64-525.60.13.run

chmod +x NVIDIA-Linux-x86_64-525.60.13.run

./NVIDIA-Linux-x86_64-525.60.13.run --check

./NVIDIA-Linux-x86_64-525.60.13.run --no-kernel-module

Say no to 32-bit compatibility and X-conf

Reboot

root@UbuntuMEDIA:~# ls -alh /dev/nvidia*

crw-rw-rw- 1 root root 195, 254 Dec 17 21:09 /dev/nvidia-modeset
crw-rw-rw- 1 root root 508,   0 Dec 17 21:09 /dev/nvidia-uvm
crw-rw-rw- 1 root root 508,   1 Dec 17 21:09 /dev/nvidia-uvm-tools
crw-rw-rw- 1 root root 195,   0 Dec 17 21:09 /dev/nvidia0
crw-rw-rw- 1 root root 195, 255 Dec 17 21:09 /dev/nvidiactl

Install nvidia-container-tools

https://nvidia.github.io/libnvidia-container/ https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html

distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \
      && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
      && curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container.list | \
         sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
         sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list


 Replace $distribution with ubuntu20.04 or most current distro

apt update && apt install -y nvidia-docker2

Configure Docker

https://www.youtube.com/watch?v=Sypi9dfMLX0

In Portainer:

Container > Select Container > Duplicate/Edit > Runtime & Resources > Enable GPU

Remove Drivers

./NVIDIA-Linux-x86_64-525.60.13.run --uninstall

If installed with Debian Backports:

systemctl stop nvidia*

dpkg -l | grep -i nvidia

apt remove --purge '^nvidia-.*'

apt autoremove
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment