Dear Exalted Techies,
I’ve been experimenting with VFIO for some time but have been unable to achieve the fabled “Bare-metal” performance described by others. I will use the game Squad as a primary example here. I am certain that this is a case of me just needing to tweak something but I’m at my wits end to determine what that is. Sadly VFIO is still a niche topic. Information is sparse and often dated. Help me Level1Techs, you’re my only hope!
System Spec
CPU: i7-10700
GPU: GTX 1070TI
RAM: 32GB
Host: Debian 12 (Stable)
Linux Kernel ver: 6.1.0-28
QEMU emulator ver: 7.2.13
Looking Glass Ver: B7-rc1 (d12 capture)
Background
In demanding games (Squad) I see a 30-40% loss in performance versus bare metal. In older titles (CS:Source, World in Conflict) the performance drop is more modest at maybe 10-20%. For example - on the same hardware, on a bare metal Windows 10 install I get a very solid 60 FPS on Squad irrespective of what is happening in the game. If I transpose this Windows 10 install into a VM I get maybe 45 FPS in quieter scenes, with the frame rate absolutely tanking to the mid 30s during battles.
Now, I suspect something funky is going on because Squad is not using all of my vm’s resources. According to Windows Task Manager I’m using ~40% of my allocated CPU resources & 50% of my GPU resources. Disk usage, according to Windows, is negligible. HWMonitor indicates that individual cores are hitting turbo frequency. At the same time my host machine is essentially “idling” running no applications other than looking-glass & virt-manager (excluding background processes etc).
It is frustrating that people say VFIO can achieve “bare-metal-like” performance without actually quantifying what that means. I.E. whether I should be expecting to lose 1% of my total performance or 10%.
VM Configuration
Grub kernel parameters:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt vfio-pci.ids=10de:1cb1,10de:0fb9"
Hugepages:
$ cat /proc/meminfo | grep Huge
AnonHugePages: 544768 kB
ShmemHugePages: 684032 kB
FileHugePages: 0 kB
HugePages_Total: 8192
HugePages_Free: 8192
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 16777216 kB
CPU Pinning:
NB: Many guides suggest using emulatorpin cpuset & iothreadpin however i’ve tested the VM with & without this explicitly defined and it doesn’t seem to have a substantial effect on performance. I’ve also tried upping iothreads to 2 (and using two iothreadpins) without any luck.
<vcpu placement="static">14</vcpu>
<iothreads>1</iothreads>
<cputune>
<vcpupin vcpu="0" cpuset="0"/>
<vcpupin vcpu="1" cpuset="8"/>
<vcpupin vcpu="2" cpuset="1"/>
<vcpupin vcpu="3" cpuset="9"/>
<vcpupin vcpu="4" cpuset="2"/>
<vcpupin vcpu="5" cpuset="10"/>
<vcpupin vcpu="6" cpuset="3"/>
<vcpupin vcpu="7" cpuset="11"/>
<vcpupin vcpu="8" cpuset="4"/>
<vcpupin vcpu="9" cpuset="12"/>
<vcpupin vcpu="10" cpuset="5"/>
<vcpupin vcpu="11" cpuset="13"/>
</cputune>
Hyperv
NB: Most guides i’ve found specify using “related state=‘on’”, but virt-manager throws an “unsupported hyperv enlightenment feature” error I i try to use this.
<hyperv mode="custom">
<vapic state="on"/>
<spinlocks state="on" retries="8191"/>
<vpindex state="on"/>
<runtime state="on"/>
<synic state="on"/>
<stimer state="on"/>
<reset state="on"/>
<frequencies state="on"/>
</hyperv>
Virtual Disk Config
<disk type="file" device="disk">
<driver name="qemu" type="raw"/>
<source file="/mnt/SSD//VMs/Gamer/gamer.img"/>
<target dev="sda" bus="sata"/>
<boot order="1"/>
<address type="drive" controller="0" bus="0" target="0" unit="0"/>
</disk>
Full VM XML
<domain type="kvm">
<name>Gamer</name>
<metadata>
<libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<libosinfo:os id="http://microsoft.com/win/10"/>
</libosinfo:libosinfo>
</metadata>
<memory unit="KiB">16777216</memory>
<currentMemory unit="KiB">16777216</currentMemory>
<memoryBacking>
<hugepages/>
</memoryBacking>
<vcpu placement="static">14</vcpu>
<iothreads>1</iothreads>
<cputune>
<vcpupin vcpu="0" cpuset="0"/>
<vcpupin vcpu="1" cpuset="8"/>
<vcpupin vcpu="2" cpuset="1"/>
<vcpupin vcpu="3" cpuset="9"/>
<vcpupin vcpu="4" cpuset="2"/>
<vcpupin vcpu="5" cpuset="10"/>
<vcpupin vcpu="6" cpuset="3"/>
<vcpupin vcpu="7" cpuset="11"/>
<vcpupin vcpu="8" cpuset="4"/>
<vcpupin vcpu="9" cpuset="12"/>
<vcpupin vcpu="10" cpuset="5"/>
<vcpupin vcpu="11" cpuset="13"/>
</cputune>
<os>
<type arch="x86_64" machine="pc-q35-7.2">hvm</type>
<loader readonly="yes" type="pflash">/usr/share/OVMF/OVMF_CODE_4M.ms.fd</loader>
<nvram>/var/lib/libvirt/qemu/nvram/Gamer_VARS.fd</nvram>
</os>
<features>
<acpi/>
<apic/>
<hyperv mode="custom">
<vapic state="on"/>
<spinlocks state="on" retries="8191"/>
<vpindex state="on"/>
<runtime state="on"/>
<synic state="on"/>
<stimer state="on"/>
<reset state="on"/>
<frequencies state="on"/>
</hyperv>
<vmport state="off"/>
</features>
<cpu mode="host-passthrough" check="none" migratable="off">
<topology sockets="1" dies="1" cores="7" threads="2"/>
</cpu>
<clock offset="localtime">
<timer name="rtc" present="no" tickpolicy="catchup"/>
<timer name="pit" present="no" tickpolicy="delay"/>
<timer name="hpet" present="no"/>
<timer name="kvmclock" present="no"/>
<timer name="hypervclock" present="yes"/>
<timer name="tsc" present="yes" mode="native"/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled="no"/>
<suspend-to-disk enabled="no"/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type="file" device="disk">
<driver name="qemu" type="raw"/>
<source file="/mnt/SSD//VMs/Gamer/gamer.img"/>
<target dev="sda" bus="sata"/>
<boot order="1"/>
<address type="drive" controller="0" bus="0" target="0" unit="0"/>
</disk>
<controller type="usb" index="0" model="qemu-xhci" ports="15">
<address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
</controller>
<controller type="pci" index="0" model="pcie-root"/>
<controller type="pci" index="1" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="1" port="0x10"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="2" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="2" port="0x11"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
</controller>
<controller type="pci" index="3" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="3" port="0x12"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
</controller>
<controller type="pci" index="4" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="4" port="0x13"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
</controller>
<controller type="pci" index="5" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="5" port="0x14"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
</controller>
<controller type="pci" index="6" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="6" port="0x15"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/>
</controller>
<controller type="pci" index="7" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="7" port="0x16"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/>
</controller>
<controller type="pci" index="8" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="8" port="0x17"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/>
</controller>
<controller type="pci" index="9" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="9" port="0x18"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="10" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="10" port="0x19"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x1"/>
</controller>
<controller type="pci" index="11" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="11" port="0x1a"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x2"/>
</controller>
<controller type="pci" index="12" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="12" port="0x1b"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x3"/>
</controller>
<controller type="pci" index="13" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="13" port="0x1c"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x4"/>
</controller>
<controller type="pci" index="14" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="14" port="0x1d"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x5"/>
</controller>
<controller type="pci" index="15" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="15" port="0x1e"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x6"/>
</controller>
<controller type="pci" index="16" model="pcie-to-pci-bridge">
<model name="pcie-pci-bridge"/>
<address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</controller>
<controller type="sata" index="0">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
</controller>
<controller type="virtio-serial" index="0">
<address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
</controller>
<interface type="bridge">
<mac address="(redacted)"/>
<source bridge="virbr0"/>
<model type="e1000e"/>
<link state="up"/>
<address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>
<channel type="spicevmc">
<target type="virtio" name="com.redhat.spice.0"/>
<address type="virtio-serial" controller="0" bus="0" port="1"/>
</channel>
<input type="mouse" bus="ps2"/>
<input type="keyboard" bus="ps2"/>
<graphics type="spice" autoport="yes">
<listen type="address"/>
<image compression="off"/>
</graphics>
<sound model="ich9">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/>
</sound>
<audio id="1" type="spice"/>
<video>
<model type="none"/>
</video>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</source>
<address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x01" slot="0x00" function="0x1"/>
</source>
<address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
</hostdev>
<memballoon model="none"/>
<shmem name="looking-glass">
<model type="ivshmem-plain"/>
<size unit="M">64</size>
<address type="pci" domain="0x0000" bus="0x10" slot="0x01" function="0x0"/>
</shmem>
</devices>
</domain>