PCIE passthrough stuttering/lag spikes

I’ve been trying to setup a Windows 10 virtual machine mainly for gaming. I had issues where vfio would give me BAR 0 error and it could not reserve memory. I used a rom file from TechPowerUp for my specific GPU and the VM finally booted.

After setting everything up i downloaded Cinebench and Unigine Heaven to test the performance of my system for comparison. Cinebench score was within a margin of error, around 1200 for both vm and bare metal with my Ryzen 5 2600. With Unigine Heaven it was little bit lower, scored 1767 for bare metal and 1576 on the vm with my RX 580 4 GB, also i noticed a little bit of stutter here and there but it wasn’t horrendous.

When i tried to play games on the vm things changed. First game that i tried was Minecraft. As the chunks loaded it started to stutter really bad and only after it was done the framerate started to stabilize. Figured maybe i should try another game just to make sure i tried Cyberpunk 2077. The framerate couter was relatively stable but it was stuttering really bad. In the general in-game area that i tested my framerate was about 17-20 fps compared to bare metal where it was around 47-50 fps.

Searching through the internet i tried just about anything i could find:

  • CPU pinning (according the the topology of my CPU)
  • isolating the cores (via kernel parameter) and assigning them exclusively to the vm
  • set scaling governor to performance
  • set scaling governor to ondemand and setting the threshold to as low as 20
  • disabling SMT in BIOS (gave about ~+5 fps but the stuttering was still there)
  • giving the vm only cores without hyperthreading
  • adding/removing audio devices (i came across a post saying that it can impact overall performance)
  • changing IO modes
  • enabled every setting related to virtualization in BIOS (IOMMU, SR-IOV, SVM)

I did come across problems when trying to get hugepages working. Running virsh command it said that it can’t reserve the memory and refused to boot even though i had plenty of free memory.

I also ran a program called LatencyMon to maybe get a better idea of what might be the case:


Relevant IOMMU Group that i isolated:

IOMMU Group 2:
    00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
    00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
    0a:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] [1002:67df] (rev e7)
    0a:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] [1002:aaf0]

My /etc/modprobe.d/vfio.conf:
options vfio-pci ids=1022:1452,1022:1453,1002:67df,1002:aaf0

/etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet amd_iommu=on iommu=pt video=vesafb:off video=efifb:off pci=realloc"

My XML at it’s current state:

<domain type='kvm' xmlns:qemu='(link removed)'>
  <name>win10</name>
  <uuid>(excluded purposefully)</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="(link removed)">
      <libosinfo:os id="(link removed)"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>12582912</memory>
  <currentMemory unit='KiB'>12582912</currentMemory>
  <vcpu placement='static'>12</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='6'/>
    <vcpupin vcpu='2' cpuset='1'/>
    <vcpupin vcpu='3' cpuset='7'/>
    <vcpupin vcpu='4' cpuset='2'/>
    <vcpupin vcpu='5' cpuset='8'/>
    <vcpupin vcpu='6' cpuset='3'/>
    <vcpupin vcpu='7' cpuset='9'/>
    <vcpupin vcpu='8' cpuset='4'/>
    <vcpupin vcpu='9' cpuset='10'/>
    <vcpupin vcpu='10' cpuset='5'/>
    <vcpupin vcpu='11' cpuset='11'/>
    <emulatorpin cpuset='0,6'/>
    <iothreadpin iothread='1' cpuset='0,6'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-5.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
    <bootmenu enable='no'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='a231f21'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='6' threads='2'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/home/user/NVME/win10.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/home/user/NVME/cyberpunk.qcow2'/>
      <target dev='vdb' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </disk>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:8c:1c:28'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
      </source>
      <rom file='/home/user/Sapphire.RX580.4096.170323.rom'/>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='yes'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc31c'/>
      </source>
      <address type='usb' bus='0' port='1'/>
    </hostdev>
    <hostdev mode='subsystem' type='usb' managed='yes'>
      <source>
        <vendor id='0x046d'/>
        <product id='0xc07e'/>
      </source>
      <address type='usb' bus='0' port='2'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
  <qemu:commandline>
    <qemu:arg value='-device'/>
    <qemu:arg value='ich9-intel-hda,bus=pcie.0,addr=0x1b'/>
    <qemu:arg value='-device'/>
    <qemu:arg value='hda-micro,audiodev=hda'/>
    <qemu:arg value='-audiodev'/>
    <qemu:arg value='pa,id=hda,server=unix:/run/user/1000/pulse/native'/>
    <qemu:arg value='-cpu'/>
    <qemu:arg value='host,hv_time,kvm=off,hv_vendor_id=null,-hypervisor'/>
  </qemu:commandline>
</domain>

I found the solution. I ended up removing -hypervisor flag from this line:

<qemu:arg value='host,hv_time,kvm=off,hv_vendor_id=null,-hypervisor'/>

stuttering is gone and performance i am getting is within a margin of error and almost at bare-metal speeds.

2 Likes