Investigating ~13% point difference between Host and Guest in Cinebench R15

Overal I don’t really have issues with gaming performance, but it is strange that there is such a big gap between Host and Guest in Cinebench R15. I can run on Cinebench R20 and see if the gap is till that big.

Cinebench R15 single thread score: 145, Host: 170

Update 1:
Tested on Cinebench R20: Single thread score: 342, Host: 421 so about ~18% difference. I’ll be reverting to older VM configurations and see if there is any difference.

Specs:

  • Guest: Windows 10 1903
  • Host: Fedora 30 @ kernel 5.3.7 with QEMU-KVM 4.1.0-4
  • 2700x overclocked to 4.1 ghz all core
  • 3000 mhz 15-15-15-35 overclocked memory
  • Gigabyte Aorus AX-370 Gaming K7 @ bios version: F31 (No PBO option, which is another topic for another time)
  • RX 5700 XT 50th Anniversary Edition

Grub settings:
amd_iommu=on,fullflush amd_iommu_intr=vapic kvm-amd.avic=1 rd.driver.pre=vfio-pci amdgpu.vm_fragment_size=9 transparent_hugepage=never isolcpus=2-7,10-15 nohz_full=2-7,10-15 rcu_nocbs=2-7,10-15 default_hugepagesz=1G hugepagesz=1G skew_tick=1"

Libvirt XML:

<domain type='kvm'>
   <name>win10-3</name>
   <uuid>6a4b4c8c-b03f-42a8-9185-64ed78b7a161</uuid>
   <metadata>
     <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
       <libosinfo:os id="http://microsoft.com/win/10"/>
     </libosinfo:libosinfo>
   </metadata>
   <memory unit='KiB'>33554432</memory>
   <currentMemory unit='KiB'>33554432</currentMemory>
   <memoryBacking>
     <hugepages>
       <page size='1048576' unit='KiB'/>
     </hugepages>
     <nosharepages/>
     <discard/>
   </memoryBacking>
   <vcpu placement='static' current='12'>16</vcpu>
   <iothreads>1</iothreads>
   <cputune>
     <vcpupin vcpu='0' cpuset='7'/>
     <vcpupin vcpu='1' cpuset='15'/>
     <vcpupin vcpu='2' cpuset='6'/>
     <vcpupin vcpu='3' cpuset='14'/>
     <vcpupin vcpu='4' cpuset='5'/>
     <vcpupin vcpu='5' cpuset='13'/>
     <vcpupin vcpu='6' cpuset='4'/>
     <vcpupin vcpu='7' cpuset='12'/>
     <vcpupin vcpu='8' cpuset='3'/>
     <vcpupin vcpu='9' cpuset='11'/>
     <vcpupin vcpu='10' cpuset='2'/>
     <vcpupin vcpu='11' cpuset='10'/>
     <emulatorpin cpuset='4-7,12-15'/>
     <iothreadpin iothread='1' cpuset='0,8'/>
   </cputune>
   <os>
     <type arch='x86_64' machine='pc-q35-4.1'>hvm</type>
     <loader readonly='yes' type='pflash'>/usr/share/edk2/ovmf/OVMF_CODE.fd</loader>
     <nvram>/var/lib/libvirt/qemu/nvram/win10-3_VARS.fd</nvram>
     <bootmenu enable='no'/>
   </os>
   <features>
     <acpi/>
     <apic/>
     <hyperv>
       <relaxed state='on'/>
       <vapic state='on'/>
       <spinlocks state='on' retries='8191'/>
       <vpindex state='on'/>
       <runtime state='on'/>
       <synic state='on'/>
       <stimer state='on'>
         <direct state='on'/>
       </stimer>
       <reset state='off'/>
       <vendor_id state='on' value='Fuck'/>
       <frequencies state='on'/>
       <reenlightenment state='on'/>
       <tlbflush state='on'/>
       <ipi state='off'/>
       <evmcs state='off'/>
     </hyperv>
     <kvm>
       <hidden state='on'/>
     </kvm>
     <vmport state='off'/>
     <ioapic driver='kvm'/>
   </features>
   <cpu mode='custom' match='exact' check='partial'>
     <model fallback='allow'>EPYC-IBPB</model>
     <topology sockets='2' cores='4' threads='2'/>
     <cache level='3' mode='emulate'/>
     <feature policy='require' name='tsc-deadline'/>
     <feature policy='require' name='hypervisor'/>
     <feature policy='require' name='tsc_adjust'/>
     <feature policy='require' name='arch-capabilities'/>
     <feature policy='require' name='cmp_legacy'/>
     <feature policy='require' name='perfctr_core'/>
     <feature policy='require' name='virt-ssbd'/>
     <feature policy='require' name='skip-l1dfl-vmentry'/>
     <feature policy='disable' name='monitor'/>
     <feature policy='disable' name='x2apic'/>
     <feature policy='require' name='topoext'/>
     <feature policy='require' name='invtsc'/>
   </cpu>
   <clock offset='localtime'>
     <timer name='rtc' tickpolicy='catchup'/>
     <timer name='pit' tickpolicy='delay'/>
     <timer name='hpet' present='no'/>
     <timer name='kvmclock' present='no'/>
     <timer name='hypervclock' present='yes'/>
     <timer name='tsc' present='yes' mode='native'/>
   </clock>
   <on_poweroff>destroy</on_poweroff>
   <on_reboot>restart</on_reboot>
   <on_crash>destroy</on_crash>
   <pm>
     <suspend-to-mem enabled='no'/>
     <suspend-to-disk enabled='no'/>
   </pm>
   <devices>
     <emulator>/usr/bin/qemu-system-x86_64</emulator>
     <controller type='usb' index='0' model='qemu-xhci' ports='15'>
       <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
     </controller>
     <controller type='pci' index='0' model='pcie-root'/>
     <controller type='pci' index='1' model='pcie-root-port'>
       <model name='pcie-root-port'/>
       <target chassis='1' port='0x10'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
     </controller>
     <controller type='pci' index='2' model='pcie-root-port'>
       <model name='pcie-root-port'/>
       <target chassis='2' port='0x11'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
     </controller>
     <controller type='pci' index='3' model='pcie-root-port'>
       <model name='pcie-root-port'/>
       <target chassis='3' port='0x12'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
     </controller>
     <controller type='pci' index='4' model='pcie-root-port'>
       <model name='pcie-root-port'/>
       <target chassis='4' port='0x13'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
     </controller>
     <controller type='pci' index='5' model='pcie-root-port'>
       <model name='pcie-root-port'/>
       <target chassis='5' port='0x14'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
     </controller>
     <controller type='pci' index='6' model='pcie-root-port'>
       <model name='pcie-root-port'/>
       <target chassis='6' port='0x15'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
     </controller>
     <controller type='pci' index='7' model='pcie-root-port'>
       <model name='pcie-root-port'/>
       <target chassis='7' port='0x16'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
     </controller>
     <controller type='pci' index='8' model='pcie-root-port'>
       <model name='pcie-root-port'/>
       <target chassis='8' port='0x8'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
     </controller>
     <controller type='pci' index='9' model='pcie-root-port'>
       <model name='pcie-root-port'/>
       <target chassis='9' port='0x9'/>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
     </controller>
     <controller type='sata' index='0'>
       <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
     </controller>
     <interface type='direct'>
       <mac address='52:54:00:2a:f2:ae'/>
       <source dev='enp5s0' mode='passthrough'/>
       <model type='virtio'/>
       <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
     </interface>
     <interface type='network'>
       <mac address='52:54:00:bd:3c:6a'/>
       <source network='isolated'/>
       <model type='virtio'/>
       <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
     </interface>
     <input type='mouse' bus='ps2'/>
     <input type='keyboard' bus='ps2'/>
     <hostdev mode='subsystem' type='pci' managed='yes'>
       <source>
         <address domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
       </source>
       <boot order='1'/>
       <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
     </hostdev>
     <hostdev mode='subsystem' type='pci' managed='yes'>
       <source>
         <address domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
       </source>
       <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
     </hostdev>
     <hostdev mode='subsystem' type='pci' managed='yes'>
       <source>
         <address domain='0x0000' bus='0x0b' slot='0x00' function='0x1'/>
       </source>
       <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
     </hostdev>
     <hostdev mode='subsystem' type='pci' managed='yes'>
       <source>
         <address domain='0x0000' bus='0x0d' slot='0x00' function='0x3'/>
       </source>
       <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
     </hostdev>
     <hostdev mode='subsystem' type='pci' managed='yes'>
       <source>
         <address domain='0x0000' bus='0x0c' slot='0x00' function='0x3'/>
       </source>
       <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
     </hostdev>
     <memballoon model='none'/>
   </devices>
 </domain>
2 Likes

Try removing the manual oc?

Try also removing the CPU config from the xml and just let it do host passthrough.

Iommu=pt might work a bit better for balancing perf between guest and host but it’s unclear what the hangup might be

Will do, but the GRUB parameters: nohz_full and rcu_nocbs disallow boost. So I’ll have to remove that on top of the manual OC.

As for using host-passthrough. One thing I notice is, an increase of L3 read write and copy performance on model EPYC in AIDA64 memory benchmark. I do believe it doesn’t give tangible difference between host-passthrough other then interesting weirdness.

So one at the time, I’ll add iommu=pt first, after that the manual OC removal + nohz_full and rcu_nocbs and lastly the host-passthrough.

Thanks in advance!

1 Like

Aaaaaallllrrrriiight I am BACK with an update.
Enabling iommu=pt, cinebench R15 + R20 score remained the same.
This is also the case when removing manual OC + nohz_full and rcu_nocbs.

BUT! By disabling AMD Cool&Quiet on top of all above. Which seemingly shot up cinebench R15 single thread score up to 160.

I am not qualified what Cool&Quiet does, other then what the word say haha! But I do know that in Linux(GNU-Linux :wink: ) you can change CPU governor, but when Cool&Quiet is disabled you cannot.

How can one option in the bios cause such deficit? I think leaving the clock handeling to CPU will allow for higher clocks? But the clocks stay stuck at 4 Ghz regardless of Cool&Quiet on or off.

Weird, but also interesting!

Before I enable back my old settings, I’ll use host-passthrough just for sanity check.

Update on that test: no change.

I’ll set this comment as solution. However there could be more in depth testing on each change. For now my recommendation is to disable Cool&Quiet for AMD cpu’s if you do vfio passthrough.

2 Likes