Skip to content

Instantly share code, notes, and snippets.

@skidunion
Last active August 7, 2024 06:18
Show Gist options
  • Save skidunion/877b362a5f4a6e406afa3054fa2c4ebf to your computer and use it in GitHub Desktop.
Save skidunion/877b362a5f4a6e406afa3054fa2c4ebf to your computer and use it in GitHub Desktop.
5700 XT Single GPU passthrough

Single GPU passthrough

Based on the prepare and release scripts: https://gitlab.com/risingprismtv/single-gpu-passthrough

Configuration

  • CPU: AMD Ryzen 7 3700X
  • GPU: AMD Radeon RX 5700 XT
  • System: Manjaro Linux, Kernel 5.12.1

Problems that I've encountered

After installing drivers in the guest, the GPU fails to initialize with code 43

Fixed by installing the vendor-reset kernel module. Load it before the VM starts (e.g. on system boot)

The following option may be required, as AMD drivers also tend to detect VMs:

<features>
    <hyperv>
        ...
        <vendor_id state="on" value="randomid"/>
    </hyperv>
</features>

VM disk performance is slow

Fixed by using VirtIO instead of QEMU SCSI/SATA emulation.

If you already have a Windows install and you wish to switch to VirtIO, I will recommend you to reinstall the OS. Windows simply won't boot if your boot storage controller suddenly changed, even if there's drivers for it.

There is a solution for this, but I didn't get it to work: https://superuser.com/questions/1057959/windows-10-in-kvm-change-boot-disk-to-virtio

My configuration:

<devices>
    ...
    <!-- configuration for the passed through SATA SSD -->
    <disk type="block" device="disk">
      <driver name="qemu" type="raw" cache="writeback" io="threads" discard="unmap"/>
      <source dev="/dev/disk/by-id/ata-Samsung_SSD_860_EVO_1TB_<serial-number>"/>
      <target dev="sdb" bus="scsi"/>
      <address type="drive" controller="0" bus="0" target="0" unit="1"/>
    </disk>
    <!-- configuration for the boot drive -->
    <disk type="file" device="disk">
      <driver name="qemu" type="raw" cache="none" io="native" discard="unmap" iothread="1" queues="8"/>
      <source file="/path/to/disk.img"/>
      <target dev="sda" bus="virtio"/>
      <boot order="1"/>
      <address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
    </disk>
</devices>

This configuration makes use of iothreads. So I left some CPU threads unused, to pin them to those iothreads, or the emulator. Pinning CPU cores is a good idea, because it may reduce some lag in the guest.

The following configuration pins the first and second cores to the emulator, and the two last to the iothreads.

The VCPUs are placed in pairs, according to my CPU topology. (see: Arch Wiki - CPU topology)

<cputune>
    <vcpupin vcpu="0" cpuset="2"/>
    <vcpupin vcpu="1" cpuset="8"/>
    <vcpupin vcpu="2" cpuset="3"/>
    <vcpupin vcpu="3" cpuset="9"/>
    <vcpupin vcpu="4" cpuset="4"/>
    <vcpupin vcpu="5" cpuset="10"/>
    <vcpupin vcpu="6" cpuset="5"/>
    <vcpupin vcpu="7" cpuset="11"/>
    <vcpupin vcpu="8" cpuset="6"/>
    <vcpupin vcpu="9" cpuset="12"/>
    <vcpupin vcpu="10" cpuset="7"/>
    <vcpupin vcpu="11" cpuset="13"/>
    <emulatorpin cpuset="0-1"/>
    <iothreadpin iothread="1" cpuset="14-15"/>
  </cputune>

Games crash due to failed memory allocation / VM shuts down unexpectedly

The VM was fairly stable, until it wasn't. My games suddenly started crashing, and I solved it by enabling huge pages.

Add the following to /etc/fstab (source: Arch Wiki):

hugetlbfs       /dev/hugepages  hugetlbfs       mode=01770,gid=kvm        0 0

and the following to the root of your XML configuration:

<memoryBacking>
  <hugepages/>
</memoryBacking>

Scripts for dynamic hugepages allocation: https://rokups.github.io/#!pages/gaming-vm-performance.md


Result

Apex Legends plays exactly like on my native Windows installation (144 fps, high textures, everything else low-mid)

#!/bin/bash
# Helpful to read output when debugging
set -x
long_delay=10
medium_delay=5
short_delay=1
hugepages_size=$(grep Hugepagesize /proc/meminfo | awk {'print $2'})
hugepages_size=$((hugepages_size * 1024))
hugepages_allocated=$(sysctl vm.nr_hugepages | awk {'print $3'})
# Source: https://gitlab.com/risingprismtv/single-gpu-passthrough
echo "Beginning of startup!"
function stop_display_manager_if_running {
if systemctl is-active --quiet $1 ; then
echo $1 >> /tmp/vfio-store-display-manager
systemctl stop $1
fi
while systemctl is-active --quiet $1 ; do
sleep "${short_delay}"
done
}
function unload_module_if_loaded {
if lsmod | grep $1 &> /dev/null ; then
modprobe -r $1
echo $1 >> /tmp/vfio-loaded-gpu-modules
fi
while lsmod | grep $1 &> /dev/null ; do
sleep 1
done
}
function get_virsh_id {
python -c "print('pci_0000_'+'$1'.split(':')[0] + '_' + '$1'.split(':')[1].split('.')[0] + '_' + '$1'.split(':')[1].split('.')[1])"
}
function get_pci_id_from_device_id {
lspci -nn | grep $1 | awk '{print $1}'
}
# Set CPU frequency governor to "performance"
cpupower frequency-set -g performance
# Stop currently running display manager
if test -e "/tmp/vfio-store-display-manager" ; then
rm -f /tmp/vfio-store-display-manager
fi
stop_display_manager_if_running sddm.service
stop_display_manager_if_running gdm.service
stop_display_manager_if_running lightdm.service
stop_display_manager_if_running lxdm.service
stop_display_manager_if_running xdm.service
stop_display_manager_if_running mdm.service
stop_display_manager_if_running display-manager.service
# Unbind VTconsoles if currently bound (adapted from https://www.kernel.org/doc/Documentation/fb/fbcon.txt)
if test -e "/tmp/vfio-bound-consoles" ; then
rm -f /tmp/vfio-bound-consoles
fi
for (( i = 0; i < 16; i++))
do
if test -x /sys/class/vtconsole/vtcon${i}; then
if [ `cat /sys/class/vtconsole/vtcon${i}/name | grep -c "frame buffer"` \
= 1 ]; then
echo 0 > /sys/class/vtconsole/vtcon${i}/bind
echo "Unbinding console ${i}"
echo $i >> /tmp/vfio-bound-consoles
fi
fi
done
# It's better to try to allocate hugepages after display manager shutdown
# since it increases the likeliness of a successful allocation, because there would be more free memory
# Automatic hugepages allocation
# Source: https://rokups.github.io/#!pages/gaming-vm-performance.md
# I only use 1 VM at the same time, so I don't account for multiple VMs
echo 3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory
# change this to include the correct path & VM name
vm_hugepages_need=$(( $(python /etc/libvirt/hooks/qemu.d/win10/prepare/vm-mem-requirements "win10") / hugepages_size ))
vm_hugepages_total=$(($hugepages_allocated + $vm_hugepages_need))
sysctl vm.nr_hugepages=$vm_hugepages_total
# THP can allegedly result in jitter. Better keep it off.
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo Allocated hugepages: $(cat /proc/sys/vm/nr_hugepages)
# According to kernel documentation (https://www.kernel.org/doc/Documentation/fb/fbcon.txt),
# specifically unbinding efi-framebuffer is not necessary after all consoles
# are unbound (and often times harmful in my experience), so it was omitted here
# I leave it here for reference in case anyone needs it.
#Unbind EFI-Framebuffer if currently bound
# if test -e "/sys/bus/platform/drivers/efi-framebuffer/unbind" ; then
# echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind
# else
# echo "Could not find framebuffer to unload!"
sleep "${medium_delay}"
# Unload loaded GPU drivers
if test -e "/tmp/vfio-loaded-gpu-modules" ; then
rm -f /tmp/vfio-loaded-gpu-modules
fi
unload_module_if_loaded amdgpu-pro
unload_module_if_loaded amdgpu
#unload_module_if_loaded nvidia_drm
#unload_module_if_loaded nvidia_modeset
#unload_module_if_loaded nvidia_uvm
#unload_module_if_loaded nvidia
#unload_module_if_loaded ipmi_devintf
#unload_module_if_loaded nouveau
#unload_module_if_loaded i915
# Unbind the GPU from display driver
if test -e "/tmp/vfio-virsh-ids" ; then
rm -f /tmp/vfio-virsh-ids
fi
# Its not required for me to detach the GPU's devices via virsh, but let's keep it here anyway
# Not going to put these into the vfio-pci options =)
gpu_device_id="1002:731f"
gpu_audio_device_id="1002:ab38"
gpu_pci_id=$(get_pci_id_from_device_id ${gpu_device_id})
gpu_audio_pci_id=$(get_pci_id_from_device_id ${gpu_audio_device_id})
virsh_gpu_id=$(get_virsh_id ${gpu_pci_id})
virsh_gpu_audio_id=$(get_virsh_id ${gpu_audio_pci_id})
echo ${virsh_gpu_audio_id} >> /tmp/vfio-virsh-ids
echo ${virsh_gpu_id} >> /tmp/vfio-virsh-ids
virsh nodedev-detach "${virsh_gpu_id}"
virsh nodedev-detach "${virsh_gpu_audio_id}"
# Load VFIO kernel module
modprobe vfio-pci
echo "End of startup!"
#!/bin/bash
set -x
# Source: https://gitlab.com/risingprismtv/single-gpu-passthrough
echo "Beginning of teardown!"
# Unload VFIO-PCI Kernel Driver
modprobe -r vfio-pci
modprobe -r vfio_iommu_type1
modprobe -r vfio
# Re-Bind GPU to AMD Driver
input="/tmp/vfio-virsh-ids"
while read virshId; do
virsh nodedev-reattach "$virshId"
done < "$input"
# Rebind VT consoles (adapted from https://www.kernel.org/doc/Documentation/fb/fbcon.txt)
input="/tmp/vfio-bound-consoles"
while read consoleNumber; do
if test -x /sys/class/vtconsole/vtcon${consoleNumber}; then
if [ `cat /sys/class/vtconsole/vtcon${consoleNumber}/name | grep -c "frame buffer"` \
= 1 ]; then
echo "Rebinding console ${consoleNumber}"
echo 1 > /sys/class/vtconsole/vtcon${consoleNumber}/bind
fi
fi
done < "$input"
# Hack that magically makes nvidia gpus work :)
#if command -v nvidia-xconfig ; then
# nvidia-xconfig --query-gpu-info > /dev/null 2>&1
#fi
# According to kernel documentation (https://www.kernel.org/doc/Documentation/fb/fbcon.txt),
# specifically unbinding efi-framebuffer is not necessary after all consoles
# are unbound (and often times harmful in my experience), so it was omitted here
# I leave it here for reference in case anyone needs it.
# Re-Bind EFI-Framebuffer
# if test -e "/sys/bus/platform/drivers/efi-framebuffer/bind" ; then
# echo "efi-framebuffer.0" > /sys/bus/platform/drivers/efi-framebuffer/bind
# else
# echo "Could not find framebuffer to bind!"
# fi
# could help prevent the GPU from failing to initialize vram total
sleep 5
#Load amd driver
#input="/tmp/vfio-loaded-gpu-modules"
#while read gpuModule; do
# modprobe "$gpuModule"
#done < "$input"
modprobe amdgpu
# Restart Display Manager
input="/tmp/vfio-store-display-manager"
while read displayManager; do
systemctl start "$displayManager"
done < "$input"
# Set hugepages allocated count to 0
sysctl vm.nr_hugepages=0
echo always > /sys/kernel/mm/transparent_hugepage/enabled
# Set CPU frequency governor back to "schedutil"
cpupower frequency-set -g schedutil
echo "End of teardown!"
<domain type="kvm">
<name>win10</name>
<uuid>ce2b5c82-6bf5-487c-8470-f66d1762a43d</uuid>
<metadata>
<libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<libosinfo:os id="http://microsoft.com/win/10"/>
</libosinfo:libosinfo>
</metadata>
<memory unit="KiB">12288000</memory>
<currentMemory unit="KiB">12288000</currentMemory>
<memoryBacking>
<hugepages/>
</memoryBacking>
<vcpu placement="static">12</vcpu>
<iothreads>1</iothreads>
<cputune>
<vcpupin vcpu="0" cpuset="2"/>
<vcpupin vcpu="1" cpuset="8"/>
<vcpupin vcpu="2" cpuset="3"/>
<vcpupin vcpu="3" cpuset="9"/>
<vcpupin vcpu="4" cpuset="4"/>
<vcpupin vcpu="5" cpuset="10"/>
<vcpupin vcpu="6" cpuset="5"/>
<vcpupin vcpu="7" cpuset="11"/>
<vcpupin vcpu="8" cpuset="6"/>
<vcpupin vcpu="9" cpuset="12"/>
<vcpupin vcpu="10" cpuset="7"/>
<vcpupin vcpu="11" cpuset="13"/>
<emulatorpin cpuset="0-1"/>
<iothreadpin iothread="1" cpuset="14-15"/>
</cputune>
<os>
<type arch="x86_64" machine="pc-q35-5.2">hvm</type>
<loader readonly="yes" type="pflash">/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader>
<nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
<bootmenu enable="yes"/>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state="on"/>
<vapic state="on"/>
<spinlocks state="on" retries="8191"/>
<vpindex state="on"/>
<synic state="on"/>
<stimer state="on"/>
<reset state="on"/>
<vendor_id state="on" value="randomid"/>
<frequencies state="on"/>
</hyperv>
<kvm>
<hidden state="on"/>
</kvm>
<vmport state="off"/>
</features>
<cpu mode="host-passthrough" check="none" migratable="on">
<topology sockets="1" dies="1" cores="6" threads="2"/>
<cache mode="passthrough"/>
<feature policy="require" name="topoext"/>
</cpu>
<clock offset="localtime">
<timer name="rtc" tickpolicy="catchup"/>
<timer name="pit" tickpolicy="delay"/>
<timer name="hpet" present="no"/>
<timer name="hypervclock" present="yes"/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled="no"/>
<suspend-to-disk enabled="no"/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type="block" device="disk">
<driver name="qemu" type="raw" cache="writeback" io="threads" discard="unmap"/>
<source dev="/dev/disk/by-id/ata-Samsung_SSD_860_EVO_1TB_<redacted>"/>
<target dev="sdb" bus="scsi"/>
<boot order="3"/>
<address type="drive" controller="0" bus="0" target="0" unit="1"/>
</disk>
<disk type="file" device="disk">
<driver name="qemu" type="raw" cache="none" io="native" discard="unmap" iothread="1" queues="8"/>
<source file="/var/lib/libvirt/images/win10.img"/>
<target dev="sda" bus="virtio"/>
<boot order="2"/>
<address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
</disk>
<controller type="usb" index="0" model="qemu-xhci" ports="15">
<address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
</controller>
<controller type="sata" index="0">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
</controller>
<controller type="pci" index="0" model="pcie-root"/>
<controller type="pci" index="1" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="1" port="0x10"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="2" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="2" port="0x11"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
</controller>
<controller type="pci" index="3" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="3" port="0x12"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
</controller>
<controller type="pci" index="4" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="4" port="0x13"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
</controller>
<controller type="pci" index="5" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="5" port="0x14"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
</controller>
<controller type="pci" index="6" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="6" port="0x8"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="7" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="7" port="0x9"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x1"/>
</controller>
<controller type="pci" index="8" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="8" port="0xa"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x2"/>
</controller>
<controller type="virtio-serial" index="0">
<address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
</controller>
<controller type="scsi" index="0" model="virtio-scsi">
<driver queues="8" iothread="1"/>
<address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</controller>
<interface type="network">
<mac address="52:54:00:05:e7:7d"/>
<source network="default"/>
<model type="e1000e"/>
<address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>
<input type="mouse" bus="ps2"/>
<input type="keyboard" bus="ps2"/>
<sound model="ich9">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/>
</sound>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x2f" slot="0x00" function="0x0"/>
</source>
<address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x2f" slot="0x00" function="0x1"/>
</source>
<address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="usb" managed="yes">
<source>
<vendor id="0x05ac"/>
<product id="0x024f"/>
</source>
<address type="usb" bus="0" port="1"/>
</hostdev>
<hostdev mode="subsystem" type="usb" managed="yes">
<source>
<vendor id="0x046d"/>
<product id="0xc08b"/>
</source>
<address type="usb" bus="0" port="4"/>
</hostdev>
<hostdev mode="subsystem" type="usb" managed="yes">
<source>
<vendor id="0x0d8c"/>
<product id="0x0203"/>
</source>
<address type="usb" bus="0" port="6"/>
</hostdev>
<hostdev mode="subsystem" type="usb" managed="yes">
<source>
<vendor id="0x3142"/>
<product id="0x0001"/>
</source>
<address type="usb" bus="0" port="5"/>
</hostdev>
<redirdev bus="usb" type="spicevmc">
<address type="usb" bus="0" port="2"/>
</redirdev>
<redirdev bus="usb" type="spicevmc">
<address type="usb" bus="0" port="3"/>
</redirdev>
<memballoon model="virtio">
<address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</memballoon>
</devices>
</domain>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment