Consumer-grade GPUs in an OpenStack system (NVIDIA GPUs)

Assumptions

This assumes you have GTX980 cards in your system (PCI id 10de:13c0 & 10de:0fbb per card). Just add more IDs for other cards in order to make this more generic. This also assumes nova uses qemu-kvm as the virtualization hypervisor (qemu-system-x86_64). This seems to be the default on OpenStack Newton when installed using openstack-ansible.

We assume OpenStack Newton is pre-installed and that we are working on a Nova compute node. This has been tested on an Ubuntu 16.04 system where I installed OpenStack AIO version 14.0.0 (different from the git tag used in the instructions!): http://docs.openstack.org/developer/openstack-ansible/developer-docs/quickstart-aio.html

Prepare the system for GPU passthrough (set up IOMMU/vfio/...)

Note: This is heavily based on information from https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Enabling_IOMMU adapted for Ubuntu 16.04

Ensure SR-IOV and VT-d are enabled in your system BIOS.
add intel_iommu=on to the kernel command line (in /etc/default/grub)
run
```
$ update-grub
```
blacklist snd_hda_intel (which might grab the audio portion of the GPU on the host) (also just blacklist all potential GPU modules while we are at it. Especially nouveau is important here.) Edit /etc/modprobe.d/blacklist.conf:
```
blacklist snd_hda_intel
blacklist amd76x_edac
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
```
Make the vfio-pci module hold on to the devices we might want to pass through (and devices in the same iommu group). This mostly just means each GPU and its audio device (even though we don't pass through the audio device). In this case PCI vendor ID 10de:13c0 is the main GPU and 10de:0fbb is its HDMI audio interface. Create /etc/modprobe.d/vfio.conf:
```
# (GTX980 and its audio controller)
options vfio-pci ids=10de:13c0,10de:0fbb
```
Note: you can find all NVIDIA cards with their PCI vendor IDs in your system using something like this:
```
$ lspci -nn | grep NVIDIA
```
Make sure vfio-pci gets loaded as early as possible by editing /etc/modules-load.d/modules.conf and adding vfio-pci to the list.
Update the initrd to apply these changes at boot by running
```
$ update-initramfs -u
```
Reboot the system in order to activate the intel_iommu=on kernel option.

Now make sure the GPUs and their audio interfaces are "in use" by vfio-pci and not by any other module. Something like this should be what you see:

root@stack:~# lspci -nnk -d 10de:13c0
05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1)
  Subsystem: eVga.com. Corp. GM204 [GeForce GTX 980] [3842:2980]
  Kernel driver in use: vfio-pci
  Kernel modules: nvidiafb, nouveau
84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1)
  Subsystem: eVga.com. Corp. GM204 [GeForce GTX 980] [3842:2980]
  Kernel driver in use: vfio-pci
  Kernel modules: nvidiafb, nouveau
root@thunerstack:~# lspci -nnk -d 10de:0fbb
05:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
  Subsystem: eVga.com. Corp. GM204 High Definition Audio Controller [3842:2980]
  Kernel driver in use: vfio-pci
  Kernel modules: snd_hda_intel
84:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
  Subsystem: eVga.com. Corp. GM204 High Definition Audio Controller [3842:2980]
  Kernel driver in use: vfio-pci
  Kernel modules: snd_hda_intel

Your system should now be ready for PCI passthrough of its GPUs.

Configure Nova on the compute node and controller

Note: This is based on information from http://docs.openstack.org/admin-guide/compute-pci-passthrough.html

Add this to nova.conf on the controller, the api and compute hosts (create more aliases for various other GPU models). Edit /etc/nova/nova.conf (on each system, compute, controller and api):

[default]
...
pci_alias = { "vendor_id":"10de", "product_id":"13c0", "device_type":"type-PCI", "name":"gtx980" }
pci_passthrough_whitelist = { "vendor_id": "10de", "product_id": "13c0" }
...

In the same file append ,PciPassthroughFilter to the scheduler_default_filters option in /etc/nova/nova.conf:

# add this to scheduler_default_filters in /etc/nova/nova.conf
scheduler_default_filters = ..... ,PciPassthroughFilter

Restart nova-compute, nova-api and nova-scheduler, depending on the node:

$ systemctl restart nova-api
$ systemctl restart nova-scheduler
$ systemctl restart nova-compute

Then configure the a flavor as usual and finally add the GPU requirement to it (1x gtx980 in this example)
```
$ openstack flavor set m1.large.1gtx980 --property "pci_passthrough:alias"="gtx980:1"
```
In this example gtx980 is the name chosen above and :1 means the flav wants one of this resource. So in order to make it a 2-GPU flavor it would be gtx980:2.
Now GPU-passthrough should work. There is one last step to perform in order to make NVIDIA consumer-grade GPUs usable in VMs. Apparently the NVIDIA driver checks if it runs inside a VM and won't start up in case it is. This seems to be a "bug" that NVIDIA probably does not intend to fix. In any case, KVM (in this case through qemu-kvm) can be configured to hide the fact that the VM is running in KVM. I do not think this can be directly changed in OpenStack/libvirtd, but one way of injecting the correct options is to install this wrapper script around qemu:
1. Rename /usr/bin/qemu-system-x86_64 to /usr/bin/qemu-system-x86_64.orig and deploy this wrapper as /usr/bin/qemu-system-x86_64 on the nova compute host.
```
#!/usr/bin/python

import os
import sys

new_args = []

# only change the "-cpu" options (inject kvm=off and hv_vendor_id=MyFake_KVM)
for i in range(len(sys.argv)):
    if i<=1: 
        new_args.append(sys.argv[i])
        continue
    if sys.argv[i-1] != "-cpu":
        new_args.append(sys.argv[i])
        continue

    subargs = sys.argv[i].split(",")

    subargs.insert(1,"kvm=off")
    subargs.insert(2,"hv_vendor_id=MyFake_KVM")

    new_arg = ",".join(subargs)

    new_args.append(new_arg)

os.execv('/usr/bin/qemu-system-x86_64.orig', new_args)
```
2. Add /usr/bin/qemu-system-x86_64.orig to /etc/apparmor.d/abstractions/libvirt-qemu as
```
/usr/bin/qemu-system-x86_64 rmix,
```
  and reload apparmor
```
$ systemctl reload apparmor
```

This should be it. You should now be able to create GPU instances in your OpenStack cluster.

Hi,There is one last step to perform in order to make NVIDIA consumer-grade GPUs usable in VMs. Hello, I edit /usr/bin/qemu-system-x86_64 on the nova compute host.

`#!/usr/bin/python

import os
import sys

new_args = []

only change the "-cpu" options (inject kvm=off and hv_vendor_id=MyFake_KVM)
for i in range(len(sys.argv)):
if i<=1:
new_args.append(sys.argv[i])
continue
if sys.argv[i-1] != "-cpu":
new_args.append(sys.argv[i])
continue

subargs = sys.argv[i].split(",")

subargs.insert(1,"kvm=off")
subargs.insert(2,"hv_vendor_id=MyFake_KVM")

new_arg = ",".join(subargs)

new_args.append(new_arg)
os.execv('/usr/bin/qemu-system-x86_64.orig', new_args)`

But I found that /usr/bin/qemu-system-x86_64 cannot be run by OpenStack. How can I change it? Or is there a mistake in my configuration?

claudiok/OpenStack_consumer_GPU_passthrough.md

Select an option

No results found

Select an option

No results found

Consumer-grade GPUs in an OpenStack system (NVIDIA GPUs)

Assumptions

Prepare the system for GPU passthrough (set up IOMMU/vfio/...)

Configure Nova on the compute node and controller

vaskokj commented Jan 17, 2017 •

edited

Loading

Uh oh!

ryanmickler commented Jul 28, 2017

Uh oh!

ryanmickler commented Jul 28, 2017

Uh oh!

olivier-dj commented Feb 2, 2018

Uh oh!

frippe75 commented Mar 2, 2018 •

edited

Loading

Uh oh!

hebimg commented May 7, 2018

Uh oh!

mgariepy commented May 9, 2018

Uh oh!

schmilmo commented May 28, 2018

Uh oh!

hemmmapart commented May 31, 2019

Uh oh!

aymen19955 commented Mar 20, 2022

Uh oh!

claudiok/OpenStack_consumer_GPU_passthrough.md

Consumer-grade GPUs in an OpenStack system (NVIDIA GPUs)

Assumptions

Prepare the system for GPU passthrough (set up IOMMU/vfio/...)

Configure Nova on the compute node and controller

vaskokj commented Jan 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryanmickler commented Jul 28, 2017

Uh oh!

ryanmickler commented Jul 28, 2017

Uh oh!

olivier-dj commented Feb 2, 2018

Uh oh!

frippe75 commented Mar 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hebimg commented May 7, 2018

Uh oh!

mgariepy commented May 9, 2018

Uh oh!

schmilmo commented May 28, 2018

Uh oh!

hemmmapart commented May 31, 2019

Uh oh!

aymen19955 commented Mar 20, 2022

Uh oh!

vaskokj commented Jan 17, 2017 •

edited

Loading

frippe75 commented Mar 2, 2018 •

edited

Loading