This assumes you have GTX980 cards in your system (PCI id 10de:13c0
& 10de:0fbb
per card). Just add more IDs for other cards in order to make this more generic. This also assumes nova uses qemu-kvm
as the virtualization hypervisor (qemu-system-x86_64
). This seems to be the default on OpenStack Newton when installed using openstack-ansible
.
We assume OpenStack Newton is pre-installed and that we are working on a Nova compute node. This has been tested on an Ubuntu 16.04 system where I installed OpenStack AIO version 14.0.0 (different from the git tag used in the instructions!): http://docs.openstack.org/developer/openstack-ansible/developer-docs/quickstart-aio.html
Note: This is heavily based on information from https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Enabling_IOMMU adapted for Ubuntu 16.04
-
Ensure SR-IOV and VT-d are enabled in your system BIOS.
-
add
intel_iommu=on
to the kernel command line (in/etc/default/grub
) -
run
$ update-grub
-
blacklist
snd_hda_intel
(which might grab the audio portion of the GPU on the host) (also just blacklist all potential GPU modules while we are at it. Especiallynouveau
is important here.) Edit/etc/modprobe.d/blacklist.conf
:blacklist snd_hda_intel blacklist amd76x_edac blacklist vga16fb blacklist nouveau blacklist rivafb blacklist nvidiafb blacklist rivatv
-
Make the
vfio-pci
module hold on to the devices we might want to pass through (and devices in the same iommu group). This mostly just means each GPU and its audio device (even though we don't pass through the audio device). In this case PCI vendor ID10de:13c0
is the main GPU and10de:0fbb
is its HDMI audio interface. Create/etc/modprobe.d/vfio.conf
:# (GTX980 and its audio controller) options vfio-pci ids=10de:13c0,10de:0fbb
Note: you can find all NVIDIA cards with their PCI vendor IDs in your system using something like this:
$ lspci -nn | grep NVIDIA
-
Make sure vfio-pci gets loaded as early as possible by editing
/etc/modules-load.d/modules.conf
and addingvfio-pci
to the list. -
Update the initrd to apply these changes at boot by running
$ update-initramfs -u
-
Reboot the system in order to activate the
intel_iommu=on
kernel option. -
Now make sure the GPUs and their audio interfaces are "in use" by
vfio-pci
and not by any other module. Something like this should be what you see:root@stack:~# lspci -nnk -d 10de:13c0 05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1) Subsystem: eVga.com. Corp. GM204 [GeForce GTX 980] [3842:2980] Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau 84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1) Subsystem: eVga.com. Corp. GM204 [GeForce GTX 980] [3842:2980] Kernel driver in use: vfio-pci Kernel modules: nvidiafb, nouveau root@thunerstack:~# lspci -nnk -d 10de:0fbb 05:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1) Subsystem: eVga.com. Corp. GM204 High Definition Audio Controller [3842:2980] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel 84:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1) Subsystem: eVga.com. Corp. GM204 High Definition Audio Controller [3842:2980] Kernel driver in use: vfio-pci Kernel modules: snd_hda_intel
-
Your system should now be ready for PCI passthrough of its GPUs.
Note: This is based on information from http://docs.openstack.org/admin-guide/compute-pci-passthrough.html
-
Add this to
nova.conf
on the controller, the api and compute hosts (create more aliases for various other GPU models). Edit/etc/nova/nova.conf
(on each system, compute, controller and api):[default] ... pci_alias = { "vendor_id":"10de", "product_id":"13c0", "device_type":"type-PCI", "name":"gtx980" } pci_passthrough_whitelist = { "vendor_id": "10de", "product_id": "13c0" } ...
In the same file append
,PciPassthroughFilter
to thescheduler_default_filters
option in/etc/nova/nova.conf
:# add this to scheduler_default_filters in /etc/nova/nova.conf scheduler_default_filters = ..... ,PciPassthroughFilter
-
Restart
nova-compute
,nova-api
andnova-scheduler
, depending on the node:$ systemctl restart nova-api $ systemctl restart nova-scheduler $ systemctl restart nova-compute
-
Then configure the a flavor as usual and finally add the GPU requirement to it (1x gtx980 in this example)
$ openstack flavor set m1.large.1gtx980 --property "pci_passthrough:alias"="gtx980:1"
In this example
gtx980
is the name chosen above and:1
means the flav wants one of this resource. So in order to make it a 2-GPU flavor it would begtx980:2
. -
Now GPU-passthrough should work. There is one last step to perform in order to make NVIDIA consumer-grade GPUs usable in VMs. Apparently the NVIDIA driver checks if it runs inside a VM and won't start up in case it is. This seems to be a "bug" that NVIDIA probably does not intend to fix. In any case, KVM (in this case through qemu-kvm) can be configured to hide the fact that the VM is running in KVM. I do not think this can be directly changed in OpenStack/libvirtd, but one way of injecting the correct options is to install this wrapper script around qemu:
-
Rename
/usr/bin/qemu-system-x86_64
to/usr/bin/qemu-system-x86_64.orig
and deploy this wrapper as/usr/bin/qemu-system-x86_64
on the nova compute host.#!/usr/bin/python import os import sys new_args = [] # only change the "-cpu" options (inject kvm=off and hv_vendor_id=MyFake_KVM) for i in range(len(sys.argv)): if i<=1: new_args.append(sys.argv[i]) continue if sys.argv[i-1] != "-cpu": new_args.append(sys.argv[i]) continue subargs = sys.argv[i].split(",") subargs.insert(1,"kvm=off") subargs.insert(2,"hv_vendor_id=MyFake_KVM") new_arg = ",".join(subargs) new_args.append(new_arg) os.execv('/usr/bin/qemu-system-x86_64.orig', new_args)
-
Add
/usr/bin/qemu-system-x86_64.orig
to/etc/apparmor.d/abstractions/libvirt-qemu
as/usr/bin/qemu-system-x86_64 rmix,
and reload apparmor
$ systemctl reload apparmor
-
This should be it. You should now be able to create GPU instances in your OpenStack cluster.
My sys.argv does not have the -cpu flag. If I manually add it, I get "qemu-system-x86_64: Unable to find CPU definition: kvm=off".
If I print out the sys.argv I see this parameters being passed to my python script.
"-S -no-user-config -nodefaults -nographic -M none -qmp unix:/var/lib/libvirt/qemu/capabilities.monitor.sock,server,nowait -pidfile /var/lib/libvirt/qemu/capabilties.pidfile -daemonize"
Did new_args.append('-cpu kvm=off') and no luck.