-
Install Harvester, then SSH into the server.
-
Edit /boot/grub/grub.cfg as follows:
set default=0
set timeout=10
set gfxmode=auto
set gfxpayload=keep
insmod all_video
insmod gfxterm
menuentry "Start Harvester" {
search.fs_label HARVESTER_STATE root
set sqfile=/k3os/system/kernel/current/kernel.squashfs
loopback loop0 /$sqfile
set root=($root)
- linux (loop0)/vmlinuz printk.devkmsg=on console=tty1
+ linux (loop0)/vmlinuz printk.devkmsg=on intel_iommu=on modprobe.blacklist=nouveau pci=noaer
initrd /k3os/system/kernel/current/initrd
}
intel_iommu=on
: enables intel IOMMU support. For AMD, useamd_iommu=on
modprobe.blacklist=nouveau
: Disable the nouveau driver. We will configure the vfio-pci driver instead later.pci=noaer
: Prevents some issues related to USB device passthrough
Reboot.
- Find the PCI Device IDs for your GPU and any other devices that may be in the same IOMMU group.
$ kubectl run -it --privileged --image ubuntu <pod name>
=> $ apt update && apt install pciutils
=> $ lspci -nnk -d 10de: # colon at the end is required
Note that "10de" is nvidia's vendor ID. One or more devices may be shown. For example:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117GLM [Quadro T1000 Mobile] [10de:1fb9] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
If multiple devices are shown, they may be grouped together in the same card and will all need to be configured for PCIe passthrough.
- Get the IDs of the devices. In this case,
10de:1fb9,10de:10fa
. Edit /boot/grub/grub.cfg as follows:
set default=0
set timeout=10
set gfxmode=auto
set gfxpayload=keep
insmod all_video
insmod gfxterm
menuentry "Start Harvester" {
search.fs_label HARVESTER_STATE root
set sqfile=/k3os/system/kernel/current/kernel.squashfs
loopback loop0 /$sqfile
set root=($root)
- linux (loop0)/vmlinuz printk.devkmsg=on intel_iommu=on modprobe.blacklist=nouveau pci=noaer
+ linux (loop0)/vmlinuz printk.devkmsg=on intel_iommu=on modprobe.blacklist=nouveau vfio-pci.ids=10de:1fb9,10de:10fa pci=noaer
initrd /k3os/system/kernel/current/initrd
}
This configuration tells the kernel to use the vfio-pci drivers for these devices.
Reboot.
- Verify the devices are using the correct driver:
$ kubectl run -it --privileged --image ubuntu <pod name>
=> $ apt update && apt install pciutils
=> $ lspci -nnk -d 10de:
If configured correctly, you should see Kernel driver in use: vfio-pci
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU117GLM [Quadro T1000 Mobile] [10de:1fb9] (rev a1)
Kernel driver in use: vfio-pci
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10fa] (rev a1)
Kernel driver in use: vfio-pci
- Install the nvidia kubevirt gpu device plugin
$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/kubevirt-gpu-device-plugin/master/manifests/nvidia-kubevirt-gpu-device-plugin.yaml
- Check the log output:
$ kubectl -n kube-system logs nvidia-kubevirt-gpu-dp-daemonset-xxxxx
You should see the following:
2021/07/19 15:52:28 Not a device, continuing
2021/07/19 15:52:28 Nvidia device 0000:01:00.0
2021/07/19 15:52:28 Iommu Group 1
2021/07/19 15:52:28 Device Id 1fb9
2021/07/19 15:52:28 Nvidia device 0000:01:00.1
2021/07/19 15:52:28 Iommu Group 1
2021/07/19 15:52:28 Error accessing file path "/sys/bus/mdev/devices": lstat /sys/bus/mdev/devices: no such file or directory
2021/07/19 15:52:28 Iommu Map map[1:[{0000:01:00.0} {0000:01:00.1}]]
2021/07/19 15:52:28 Device Map map[1fb9:[1]]
2021/07/19 15:52:28 vGPU Map map[]
2021/07/19 15:52:28 GPU vGPU Map map[]
2021/07/19 15:52:28 DP Name TU117GLM_Quadro_T1000_Mobile
2021/07/19 15:52:28 Devicename TU117GLM_Quadro_T1000_Mobile
2021/07/19 15:52:28 TU117GLM_Quadro_T1000_Mobile Device plugin server ready
Copy the device plugin name, in this example it is TU117GLM_Quadro_T1000_Mobile
- At the time of writing, Harvester does not have a UI for provisioning GPUs, so we will need to edit the YAML for a virtual machine. Create a VM instance and stop it. Then, edit its yaml as follows:
...
spec:
domain:
cpu:
cores: 4
sockets: 1
threads: 1
devices:
disks:
- disk:
bus: virtio
name: disk-0
- bootOrder: 1
disk:
bus: virtio
name: disk-1
+ gpus:
+ - deviceName: nvidia.com/TU117GLM_Quadro_T1000_Mobile
+ name: gpu1
...
(Replace the part after nvidia.com/
with your device name)
- Start the VM. If configured correctly, you should see the following output from the kubevirt gpu plugin pod:
2021/07/19 15:53:08 In allocate
2021/07/19 15:53:08 Allocated devices [0000:01:00.0 0000:01:00.1]
If the VM fails to start, check kubevirt logs. If you see errors such as:
Please ensure all devices within the iommu_group are bound to their vfio bus driver.
,
this means there are other devices in the same IOMMU group as your GPU which also
need to be configured with the vfio-pci driver. Edit the kernel cmdline to include
these devices and reboot.
@mirceanton This guide is pretty old (written before harvester 1.0), and I'm not a maintainer of Harvester so I can't guarantee that this still works. Some things I would try:
modprobe vfio-pci
while the gpu is not bound to the nvidia driver, and see if it binds to vfio_pci.