Harvester GPU Provisioning

Install Harvester, then SSH into the server.
Edit /boot/grub/grub.cfg as follows:

 set default=0
 set timeout=10
 
 set gfxmode=auto
 set gfxpayload=keep
 insmod all_video
 insmod gfxterm
 
 menuentry "Start Harvester" {
   search.fs_label HARVESTER_STATE root
   set sqfile=/k3os/system/kernel/current/kernel.squashfs
   loopback loop0 /$sqfile
   set root=($root)
-  linux (loop0)/vmlinuz printk.devkmsg=on console=tty1
+  linux (loop0)/vmlinuz printk.devkmsg=on intel_iommu=on modprobe.blacklist=nouveau pci=noaer
   initrd /k3os/system/kernel/current/initrd
 }

intel_iommu=on: enables intel IOMMU support. For AMD, use amd_iommu=on
modprobe.blacklist=nouveau: Disable the nouveau driver. We will configure the vfio-pci driver instead later.
pci=noaer: Prevents some issues related to USB device passthrough

Reboot.

Find the PCI Device IDs for your GPU and any other devices that may be in the same IOMMU group.

$ kubectl run -it --privileged --image ubuntu <pod name>
=> $ apt update && apt install pciutils
=> $ lspci -nnk -d 10de: # colon at the end is required

Note that "10de" is nvidia's vendor ID. One or more devices may be shown. For example:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117GLM [Quadro T1000 Mobile] [10de:1fb9] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)

If multiple devices are shown, they may be grouped together in the same card and will all need to be configured for PCIe passthrough.

Get the IDs of the devices. In this case, 10de:1fb9,10de:10fa. Edit /boot/grub/grub.cfg as follows:

 set default=0
 set timeout=10
 
 set gfxmode=auto
 set gfxpayload=keep
 insmod all_video
 insmod gfxterm
 
 menuentry "Start Harvester" {
   search.fs_label HARVESTER_STATE root
   set sqfile=/k3os/system/kernel/current/kernel.squashfs
   loopback loop0 /$sqfile
   set root=($root)
-  linux (loop0)/vmlinuz printk.devkmsg=on intel_iommu=on modprobe.blacklist=nouveau pci=noaer
+  linux (loop0)/vmlinuz printk.devkmsg=on intel_iommu=on modprobe.blacklist=nouveau vfio-pci.ids=10de:1fb9,10de:10fa pci=noaer
   initrd /k3os/system/kernel/current/initrd
 }

This configuration tells the kernel to use the vfio-pci drivers for these devices.

Reboot.

Verify the devices are using the correct driver:

$ kubectl run -it --privileged --image ubuntu <pod name>
=> $ apt update && apt install pciutils
=> $ lspci -nnk -d 10de:

If configured correctly, you should see Kernel driver in use: vfio-pci

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117GLM [Quadro T1000 Mobile] [10de:1fb9] (rev a1)
	Kernel driver in use: vfio-pci
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
	Kernel driver in use: vfio-pci

Install the nvidia kubevirt gpu device plugin

$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/kubevirt-gpu-device-plugin/master/manifests/nvidia-kubevirt-gpu-device-plugin.yaml

Check the log output: $ kubectl -n kube-system logs nvidia-kubevirt-gpu-dp-daemonset-xxxxx

You should see the following:

2021/07/19 15:52:28 Not a device, continuing
2021/07/19 15:52:28 Nvidia device  0000:01:00.0
2021/07/19 15:52:28 Iommu Group 1
2021/07/19 15:52:28 Device Id 1fb9
2021/07/19 15:52:28 Nvidia device  0000:01:00.1
2021/07/19 15:52:28 Iommu Group 1
2021/07/19 15:52:28 Error accessing file path "/sys/bus/mdev/devices": lstat /sys/bus/mdev/devices: no such file or directory
2021/07/19 15:52:28 Iommu Map map[1:[{0000:01:00.0} {0000:01:00.1}]]
2021/07/19 15:52:28 Device Map map[1fb9:[1]]
2021/07/19 15:52:28 vGPU Map  map[]
2021/07/19 15:52:28 GPU vGPU Map  map[]
2021/07/19 15:52:28 DP Name TU117GLM_Quadro_T1000_Mobile
2021/07/19 15:52:28 Devicename TU117GLM_Quadro_T1000_Mobile
2021/07/19 15:52:28 TU117GLM_Quadro_T1000_Mobile Device plugin server ready

Copy the device plugin name, in this example it is TU117GLM_Quadro_T1000_Mobile

At the time of writing, Harvester does not have a UI for provisioning GPUs, so we will need to edit the YAML for a virtual machine. Create a VM instance and stop it. Then, edit its yaml as follows:

...
    spec:
      domain:
        cpu:
          cores: 4
          sockets: 1
          threads: 1
        devices:
          disks:
          - disk:
              bus: virtio
            name: disk-0
          - bootOrder: 1
            disk:
              bus: virtio
            name: disk-1
+         gpus:
+         - deviceName: nvidia.com/TU117GLM_Quadro_T1000_Mobile
+           name: gpu1
...

(Replace the part after nvidia.com/ with your device name)

Start the VM. If configured correctly, you should see the following output from the kubevirt gpu plugin pod:

2021/07/19 15:53:08 In allocate
2021/07/19 15:53:08 Allocated devices [0000:01:00.0 0000:01:00.1]

If the VM fails to start, check kubevirt logs. If you see errors such as: Please ensure all devices within the iommu_group are bound to their vfio bus driver., this means there are other devices in the same IOMMU group as your GPU which also need to be configured with the vfio-pci driver. Edit the kernel cmdline to include these devices and reboot.

Okay so, the solution i managed to find only works partially. The gpu would throw a code 43 error every time i reboot the vm and i would have to reinstall the drivers. Since this was a time-constrained project, i moved back to proxmox (for now) and decided to give this another shot in the next release in september since their github states they will add better support for hardware passthrough then.

That being said, what I had to do was to implement this guide while also keeping a close eye on this reddit post. The trick is, the file system seems to be ephemeral, in the sense that it is re-generated on every boot, like a container. As such, what you have to do is to configure all your files and settings in the /oem/99-.... file. The format seems similar to a cloud init.

Truth be told, i haven't saved my configuration from last time since i didn't get it working, but what i remember is that i had to manually add more entries to the write_files section to do the driver blacklist and configure everything in those guides. Additionally, I had to do some softdep for the drivers on some gpu devices, such as the audio controller, to make sure the default audio driver to vfio-pci as well as manually override drivers with commands such as:

#!/bin/bash

# unbind from drivers
echo 0000:0a:00.0 > /sys/bus/pci/devices/0000\:0a\:00.0/driver/unbind
echo 0000:0a:00.1 > /sys/bus/pci/devices/0000\:0a\:00.1/driver/unbind

# bind to vfio
echo 0000:0a:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
echo 0000:0a:00.1 > /sys/bus/pci/drivers/vfio-pci/bind

Sorry I can't be of more service right now, but I hope at least i am pointing you in the right direction.

Do let me know if you manage to get it working though!

kralicky/harvester-gpu.md

Harvester GPU Provisioning

mircea-pavel-anton commented Jun 30, 2022