Skip to content

Instantly share code, notes, and snippets.

@mcastelino
Last active June 18, 2024 22:25
Show Gist options
  • Save mcastelino/08f6e49f2faba295eb690a3a8ee44c70 to your computer and use it in GitHub Desktop.
Save mcastelino/08f6e49f2faba295eb690a3a8ee44c70 to your computer and use it in GitHub Desktop.
QEMU VFIO in Nested VM vIOMMU

How to use VFIO to assign a device to nested VM

  • Here the vfio-pci device is passed in into the L1 VM
  • The L1 VM is setup with kernel_irqchip=split
  • The L0 exposes a virtual IOMMU to the L1 VM
qemu-system-x86_64 \
    -machine q35,accel=kvm,kernel_irqchip=split \
    -enable-kvm \
    -bios OVMF.fd \
    -smp sockets=1,cpus=4,cores=2 -cpu host \
    -m 1024 \
    -vga none -nographic \
    -drive file="$IMAGE",if=virtio,aio=threads,format=raw \
    -netdev user,id=mynet0,hostfwd=tcp::${VMN}0022-:22,hostfwd=tcp::${VMN}2375-:2375 \
    -device virtio-net-pci,netdev=mynet0 \
    -device virtio-rng-pci \
    -monitor telnet:127.0.0.1:55555,server,nowait \
    -debugcon file:debug.log -global isa-debugcon.iobase=0x402 $@ \
    -device intel-iommu,intremap=on,caching-mode=on \
    -device vfio-pci,host=b3:00.0 \

Within the VM you will see

root@clr-d8a5d96d9a844656bcab094780f420b2 ~ # dmesg | grep -e DMAR -e IOMMU
[    0.000000] ACPI: DMAR 0x000000003E86C000 000048 (v01 BOCHS  BXPCDMAR 00000001 BXPC 00000001)
[    0.000000] DMAR: IOMMU enabled
[    0.145746] DMAR: Host address width 39
[    0.145747] DMAR: DRHD base: 0x000000fed90000 flags: 0x1
[    0.145769] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 12008c22260286 ecap f00f5a
[    0.145776] DMAR: No RMRR found
[    0.145776] DMAR: No ATSR found
[    0.145825] DMAR: dmar0: Using Queued invalidation
[    0.218192] DMAR: Setting RMRR:
[    0.218193] DMAR: Prepare 0-16MiB unity mapping for LPC
[    0.219038] DMAR: Setting identity map for device 0000:00:1f.0 [0x0 - 0xffffff]
[    0.257194] DMAR: Intel(R) Virtualization Technology for Directed I/O

You will also see IOMMU groups setup within the VM

root@clr-d8a5d96d9a844656bcab094780f420b2 ~ # lspci -v -s 00:03.0
00:03.0 Serial controller: MosChip Semiconductor Technology Ltd. 4-Port PCIe Serial Adapter (prog-if 02 [16550])
        Subsystem: Device a000:1000
        Flags: bus master, fast devsel, latency 0, IRQ 23
        I/O ports at 60e0 [size=8]
        Memory at 90003000 (32-bit, non-prefetchable) [size=4K]
        Memory at 90002000 (32-bit, non-prefetchable) [size=4K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [78] Power Management version 3
        Kernel driver in use: serial
root@clr-d8a5d96d9a844656bcab094780f420b2 ~ # find /sys/kernel/iommu_groups/ -type l
/sys/kernel/iommu_groups/5/devices/0000:00:1f.2
/sys/kernel/iommu_groups/5/devices/0000:00:1f.0
/sys/kernel/iommu_groups/5/devices/0000:00:1f.3
/sys/kernel/iommu_groups/3/devices/0000:00:03.0
/sys/kernel/iommu_groups/1/devices/0000:00:01.0
/sys/kernel/iommu_groups/4/devices/0000:00:04.0
/sys/kernel/iommu_groups/2/devices/0000:00:02.0
/sys/kernel/iommu_groups/0/devices/0000:00:00.0
root@clr-d8a5d96d9a844656bcab094780f420b2 ~ # readlink /sys/kernel/iommu_groups/3/devices/0000:00:03.0
../../../../devices/pci0000:00/0000:00:03.0

This device we assigned through VFIO is now in its own IOMMU groups and can be assigned using VFIO in L1 to a L2 VM.

@amshinde
Copy link

The L1 VM is booted with IOMMU support by passing intel_iommu=on on its kernel command line.
If a virtio device is to be assigned to vfio, then it needs to be passed as :

-device virtio-net-pci,netdev=mynet0,disable-legacy=on,disable-modern=off,iommu_platform=on,ats=on \
-device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on

Documentation can be found at:
https://wiki.qemu.org/Features/VT-d#Command_Line_Example_2
Although the device is a virtio-net devic, it is bound to virtio-pci driver.

@shlomopongratz
Copy link

Is the above QEMU command for L0 or L1 QEMU or for both?

@ormergi
Copy link

ormergi commented Jul 25, 2021

Very nice!
I tried to do that for SRIOV VF device (assign it to L2 VM), but I get these errors:

[   95.767571] iavf 0000:05:00.0: Failed to communicate with PF; waiting before retry
[   48.757624] iavf 0000:05:00.0: Admin queue command never completed

Any idea why?

@mcastelino
Copy link
Author

Is the above QEMU command for L0 or L1 QEMU or for both?

That is for the L0 in order to pass the device further up to the L1

@mcastelino
Copy link
Author

Very nice!
I tried to do that for SRIOV VF device (assign it to L2 VM), but I get these errors:

[   95.767571] iavf 0000:05:00.0: Failed to communicate with PF; waiting before retry
[   48.757624] iavf 0000:05:00.0: Admin queue command never completed

Any idea why?

Where do you see this error. Also I am surprised if this happens inside the VM. Are you passing in the PF to the VM. Ideally you should pass in the a VF that was created.

@staysh
Copy link

staysh commented Sep 11, 2021

Do you know of a way to pass those qemu command line options via virt-install?

Alternatively could you post XML of the devices created with those arguments?

@ormergi
Copy link

ormergi commented Nov 22, 2021

Where do you see this error. Also I am surprised if this happens inside the VM. Are you passing in the PF to the VM. Ideally you should pass in the a VF that was created.

I saw it inside L2 guest VM dmesg log, once I bumped the VM RAM memory it didnt occurred again.

@mikeyo
Copy link

mikeyo commented Feb 5, 2024

Very late to the party with this but I managed to get this working on an Intel 11900K build running KVM/UNRAID/PROXMOX using the following Qemu args -

<qemu:arg value='-machine'/>
<qemu:arg value='kernel-irqchip=split'/>
<qemu:arg value='-device'/>
<qemu:arg value='intel-iommu,intremap=on,caching-mode=on'/>

However, the same args do not expose the devices for passthrough on my AMD 3950x build.
What do I need to change for AMD?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment