Skip to content

Instantly share code, notes, and snippets.

@claudiok
Last active March 26, 2023 08:20
Show Gist options
  • Save claudiok/890ab6dfe76fa45b30081e58038a9215 to your computer and use it in GitHub Desktop.
Save claudiok/890ab6dfe76fa45b30081e58038a9215 to your computer and use it in GitHub Desktop.
Consumer-grade GPU passthrough in an OpenStack system (NVIDIA GPUs)

Consumer-grade GPUs in an OpenStack system (NVIDIA GPUs)

Assumptions

This assumes you have GTX980 cards in your system (PCI id 10de:13c0 & 10de:0fbb per card). Just add more IDs for other cards in order to make this more generic. This also assumes nova uses qemu-kvm as the virtualization hypervisor (qemu-system-x86_64). This seems to be the default on OpenStack Newton when installed using openstack-ansible.

We assume OpenStack Newton is pre-installed and that we are working on a Nova compute node. This has been tested on an Ubuntu 16.04 system where I installed OpenStack AIO version 14.0.0 (different from the git tag used in the instructions!): http://docs.openstack.org/developer/openstack-ansible/developer-docs/quickstart-aio.html

Prepare the system for GPU passthrough (set up IOMMU/vfio/...)

Note: This is heavily based on information from https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Enabling_IOMMU adapted for Ubuntu 16.04

  1. Ensure SR-IOV and VT-d are enabled in your system BIOS.

  2. add intel_iommu=on to the kernel command line (in /etc/default/grub)

  3. run

    $ update-grub
    
  4. blacklist snd_hda_intel (which might grab the audio portion of the GPU on the host) (also just blacklist all potential GPU modules while we are at it. Especially nouveau is important here.) Edit /etc/modprobe.d/blacklist.conf:

    blacklist snd_hda_intel
    blacklist amd76x_edac
    blacklist vga16fb
    blacklist nouveau
    blacklist rivafb
    blacklist nvidiafb
    blacklist rivatv
    
  5. Make the vfio-pci module hold on to the devices we might want to pass through (and devices in the same iommu group). This mostly just means each GPU and its audio device (even though we don't pass through the audio device). In this case PCI vendor ID 10de:13c0 is the main GPU and 10de:0fbb is its HDMI audio interface. Create /etc/modprobe.d/vfio.conf:

    # (GTX980 and its audio controller)
    options vfio-pci ids=10de:13c0,10de:0fbb
    

    Note: you can find all NVIDIA cards with their PCI vendor IDs in your system using something like this:

    $ lspci -nn | grep NVIDIA
    
  6. Make sure vfio-pci gets loaded as early as possible by editing /etc/modules-load.d/modules.conf and adding vfio-pci to the list.

  7. Update the initrd to apply these changes at boot by running

    $ update-initramfs -u
    
  8. Reboot the system in order to activate the intel_iommu=on kernel option.

  9. Now make sure the GPUs and their audio interfaces are "in use" by vfio-pci and not by any other module. Something like this should be what you see:

    root@stack:~# lspci -nnk -d 10de:13c0
    05:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1)
      Subsystem: eVga.com. Corp. GM204 [GeForce GTX 980] [3842:2980]
      Kernel driver in use: vfio-pci
      Kernel modules: nvidiafb, nouveau
    84:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1)
      Subsystem: eVga.com. Corp. GM204 [GeForce GTX 980] [3842:2980]
      Kernel driver in use: vfio-pci
      Kernel modules: nvidiafb, nouveau
    root@thunerstack:~# lspci -nnk -d 10de:0fbb
    05:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
      Subsystem: eVga.com. Corp. GM204 High Definition Audio Controller [3842:2980]
      Kernel driver in use: vfio-pci
      Kernel modules: snd_hda_intel
    84:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
      Subsystem: eVga.com. Corp. GM204 High Definition Audio Controller [3842:2980]
      Kernel driver in use: vfio-pci
      Kernel modules: snd_hda_intel
    
  10. Your system should now be ready for PCI passthrough of its GPUs.

Configure Nova on the compute node and controller

Note: This is based on information from http://docs.openstack.org/admin-guide/compute-pci-passthrough.html

  1. Add this to nova.conf on the controller, the api and compute hosts (create more aliases for various other GPU models). Edit /etc/nova/nova.conf (on each system, compute, controller and api):

    [default]
    ...
    pci_alias = { "vendor_id":"10de", "product_id":"13c0", "device_type":"type-PCI", "name":"gtx980" }
    pci_passthrough_whitelist = { "vendor_id": "10de", "product_id": "13c0" }
    ...
    

    In the same file append ,PciPassthroughFilter to the scheduler_default_filters option in /etc/nova/nova.conf:

    # add this to scheduler_default_filters in /etc/nova/nova.conf
    scheduler_default_filters = ..... ,PciPassthroughFilter
    
  2. Restart nova-compute, nova-api and nova-scheduler, depending on the node:

    $ systemctl restart nova-api
    $ systemctl restart nova-scheduler
    $ systemctl restart nova-compute
    
  3. Then configure the a flavor as usual and finally add the GPU requirement to it (1x gtx980 in this example)

    $ openstack flavor set m1.large.1gtx980 --property "pci_passthrough:alias"="gtx980:1"
    

    In this example gtx980 is the name chosen above and :1 means the flav wants one of this resource. So in order to make it a 2-GPU flavor it would be gtx980:2.

  4. Now GPU-passthrough should work. There is one last step to perform in order to make NVIDIA consumer-grade GPUs usable in VMs. Apparently the NVIDIA driver checks if it runs inside a VM and won't start up in case it is. This seems to be a "bug" that NVIDIA probably does not intend to fix. In any case, KVM (in this case through qemu-kvm) can be configured to hide the fact that the VM is running in KVM. I do not think this can be directly changed in OpenStack/libvirtd, but one way of injecting the correct options is to install this wrapper script around qemu:

    1. Rename /usr/bin/qemu-system-x86_64 to /usr/bin/qemu-system-x86_64.orig and deploy this wrapper as /usr/bin/qemu-system-x86_64 on the nova compute host.

      #!/usr/bin/python
      
      import os
      import sys
      
      new_args = []
      
      # only change the "-cpu" options (inject kvm=off and hv_vendor_id=MyFake_KVM)
      for i in range(len(sys.argv)):
          if i<=1: 
              new_args.append(sys.argv[i])
              continue
          if sys.argv[i-1] != "-cpu":
              new_args.append(sys.argv[i])
              continue
      
          subargs = sys.argv[i].split(",")
      
          subargs.insert(1,"kvm=off")
          subargs.insert(2,"hv_vendor_id=MyFake_KVM")
      
          new_arg = ",".join(subargs)
      
          new_args.append(new_arg)
      
      os.execv('/usr/bin/qemu-system-x86_64.orig', new_args)
    2. Add /usr/bin/qemu-system-x86_64.orig to /etc/apparmor.d/abstractions/libvirt-qemu as

      /usr/bin/qemu-system-x86_64 rmix,
      

      and reload apparmor

      $ systemctl reload apparmor
      

This should be it. You should now be able to create GPU instances in your OpenStack cluster.

@frippe75
Copy link

frippe75 commented Mar 2, 2018

This document was soo spot on! thanks. used it for Openstack/Newton on CentOS 7.4 (gtx1050ti el cheapo).
There was only a few expected differences in file locations but now.... nvidia-smi finally returns info... Thanks!

@hebimg
Copy link

hebimg commented May 7, 2018

Hi,There is one last step to perform in order to make NVIDIA consumer-grade GPUs usable in VMs. Hello, I edit /usr/bin/qemu-system-x86_64 on the nova compute host.

`#!/usr/bin/python

import os
import sys

new_args = []

only change the "-cpu" options (inject kvm=off and hv_vendor_id=MyFake_KVM)
for i in range(len(sys.argv)):
if i<=1:
new_args.append(sys.argv[i])
continue
if sys.argv[i-1] != "-cpu":
new_args.append(sys.argv[i])
continue

subargs = sys.argv[i].split(",")

subargs.insert(1,"kvm=off")
subargs.insert(2,"hv_vendor_id=MyFake_KVM")

new_arg = ",".join(subargs)

new_args.append(new_arg)
os.execv('/usr/bin/qemu-system-x86_64.orig', new_args)`

But I found that /usr/bin/qemu-system-x86_64 cannot be run by OpenStack. How can I change it? Or is there a mistake in my configuration?

@mgariepy
Copy link

mgariepy commented May 9, 2018

Just to inform you that if you are using Openstack Pike or later, you can use the gpu directly with the installation.

you only need to set metadata in your image :
img_hide_hypervisor_id='true'

@schmilmo
Copy link

Could someone tell what are the changes needed for RHEL/Centos?

@hemmmapart
Copy link

Hello, may i ask how did you deploy the openstack? In kolla way, i found it hard to find these files.

@aymen19955
Copy link

thanks for this detail led guide
i have a question regarding audio interfaces of the GPU, is it possible to passe it also to the gust or no ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment