Increasing VFIO VGA Performance

Actually I am on a ioh3420 device, unless I’ve done something wrong in my config. I’ll post my XML file momentarily. Here it is
https://privatebin.net/?2ac0a15b466275f6#gkoxDZrl1bILHpRJwcIMnS2IMIqIqp7G7E0+QIKBUT0=

I think I am getting Gen 3 speeds, GPU-z just reports it wrong. I think @FurryJackman is getting Gen 2 speeds.

A bus interface of “PCI” means that QEMU has hidden the PCIe configuration space extensions as they are invalid when it’s on the root complex. You should be seeing at the very least “PCIe …” in there.

No, your benchmark is showing Gen2 speeds, or atleast not 16 channels wide.

Your XML has everything connected to bus 0, which is the root complex. You need to do the following:

  1. Remove the PCIe-PCI device (pcie-to-pci-bridge), you’re not using it.
  2. Change the bus to 0x01 for the VGA devices to have it attach to the pcie-root-port rather then the pcie-root.

Note that I do not expect this to increase performance until Qemu is fixed to allow proper link negotiation.

Yeah #2 did it. GPU-z reports PCIe x16 3.0 @ x16 3.0 and nVidia control panel shows Bus PCI Express x1, which still looks weird but it’s not x0.

Also, the numbers on that test changed dramatically

I’ll run a Fire Strike benchmark too to make sure nothing else changed.

Ok, so that confirms it, when it’s running on the root complex as a PCI device (not PCIe) no link negotiation takes place and as such it performs at a higher data rate, but the guest drivers dont program the ASIC properly, yielding a drop in potential/baremetal performance.

I am going to try to get in contact with Alex, the original author of the code I have hacked up to see if I can get him interested in working with me to make and upstream a patch to resolve this. The PCIe spec is huge and I need someone to hold my hand on this one to make sure I don’t create problems in the process.

Out of interest is there any chance you would be willing to test out the patch I created and see what numbers you get? It should be fine on your VM since you do not attach any additional pass-through devices.

5 Likes

Unfortunately, I won’t be able to help with that. It’s also almost 6 AM here and I have to sleep.

But, I’d like to add that Fire Strike actually reports a higher score than before, albeit not much higher. I got 14754 where as the previous score with the old configuration was 14266.

1 Like

No problem! Thanks for your help thus far!

Yes, I saw the same effect when I first moved across, even though bandwidth has dropped something is certainly improved somewhere.

This is great news as it means that the root cause of the issue has been positively identified and empirically shown to improve things around the board.

2 Likes

I should be able to help test also

1 Like

I can try compile QEMU with these patches too.
Any recomended version of QEMU src?

What about affinity CUDA_tool(die/cores) and GPU(die/PCIe)?

BTW we have a small VFIO Discord server too if anyone would like to join for more immediate interaction.

 <controller type='pci' index='8' model='pcie-root-port'>
      <model name='ioh3420'/>
      <target chassis='1' port='0x1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' function='0x0' multifunction='on'/>
    </controller>

Small note, for model name you can simply specify

 <model name='pcie-root-port'/>

No need for ioh3420 anymore.

That aside I’ve done some testing with my RX580 and I’m getting near enough to metal performance. About 200 points off from a 1411Mhz RX580 on Fire Strike Ultra. Mine are clocked at 1400Mhz default.

EDIT, actually it’s a lot closer
https://www.3dmark.com/compare/fs/16618922/fs/14602552/fs/14452836
The difference is just due to CPU ( I’m pinning 1 Ryzen CCX)
It’s basically 99% of metal performance

Graphics Scores on Firestrike Ultra

 3302     AMD Radeon RX 580 (1x Off) (1,400 MHz)  (ME)
 3338     AMD Radeon RX 580 (1x Off) (1,411 MHz) 
 3322     AMD Radeon RX 580 (1x Off) (1,425 MHz) 

Firestrike Extreme
https://www.3dmark.com/3dm/29187093?

Firestrike Ultra
https://www.3dmark.com/3dm/29186411?

For firestrike extreme it’s a tiny bit different perhaps because I’m held back by the CPU
https://www.3dmark.com/compare/fs/16619222/fs/16539748/fs/14114813/fs/15958796

Graphics Scores on Firestrike Extreme

6641       AMD Radeon RX 580 (1x Off) (1,400 MHz) (ME)
7070       AMD Radeon RX 580 (1x Off) (1,430 MHz) 
6993       AMD Radeon RX 580 (1x Off) (1,425 MHz) 
7047       AMD Radeon RX 580 (1x Off) (1,410 MHz) 

GPU-z also always reports my GPU as running at x8 3.0 as expected for an X370 mainboard. And PCI-e link scaling on the host is taking place

                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L1 <1us
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                LnkSta: Speed 8GT/s (ok), Width x8 (downgraded)
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-

However I did recently get quite an interesting dmesg flood when booting the VM

[Mon Oct  8 12:22:19 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a6b3957300 flags=0x0010]
[Mon Oct  8 12:22:19 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a6b3957280 flags=0x0010]
[Mon Oct  8 12:22:19 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a6b3957b80 flags=0x0010]
[Mon Oct  8 12:22:19 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a6b3957780 flags=0x0010]
[Mon Oct  8 12:22:19 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a6b3957680 flags=0x0010]
[Mon Oct  8 12:22:19 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a6b3957080 flags=0x0010]
[Mon Oct  8 12:22:19 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a6b3957180 flags=0x0010]
[Mon Oct  8 12:22:19 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a6b3957880 flags=0x0010]
[Mon Oct  8 12:22:19 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a6b3957980 flags=0x0010]
[Mon Oct  8 12:22:19 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a6b3957580 flags=0x0010]
[Mon Oct  8 12:22:19 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a6b3957480 flags=0x0010]
[Mon Oct  8 12:22:19 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a6b3957d80 flags=0x0010]
[Mon Oct  8 12:22:19 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a6b3957c80 flags=0x0010]
[Mon Oct  8 12:22:19 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a6b3957a80 flags=0x0010]
[Mon Oct  8 12:22:19 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a6b3957f80 flags=0x0010]
[Mon Oct  8 12:22:19 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a6b3957e80 flags=0x0010]
[Mon Oct  8 12:22:19 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a6b3957000 flags=0x0010]
[Mon Oct  8 12:22:19 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a6b3957100 flags=0x0010]
[Mon Oct  8 12:22:19 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a6b39571c0 flags=0x0010]
[Mon Oct  8 12:22:19 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a6b39572c0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] amd_iommu_report_page_fault: 195 callbacks suppressed
[Mon Oct  8 12:22:29 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a90b4ff200 flags=0x0010]
[Mon Oct  8 12:22:29 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a90b4ff3c0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a90b4ffac0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a90b4ffbc0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a90b4ff7c0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a90b4ff6c0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a90b4ff0c0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a90b4ff1c0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a90b4ff8c0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] vfio-pci 0000:27:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x000000a90b4ff9c0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] amd_iommu_report_page_fault: 185 callbacks suppressed
[Mon Oct  8 12:22:29 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a90b4ff5c0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a90b4ff4c0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a90b4ffdc0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a90b4ffcc0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a90b4fffc0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a90b4ffec0 flags=0x0010]
[Mon Oct  8 12:22:29 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a90b4ff000 flags=0x0010]
[Mon Oct  8 12:22:29 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a90b4ff180 flags=0x0010]
[Mon Oct  8 12:22:29 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a90b4ff280 flags=0x0010]
[Mon Oct  8 12:22:29 2018] AMD-Vi: Event logged [IO_PAGE_FAULT device=27:00.0 domain=0x0000 address=0x000000a90b4ff380 flags=0x0010]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47f200 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47fa40 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47f040 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47f140 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47f840 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47f940 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47f340 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47fb40 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47f640 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47f740 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47fe40 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47f440 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47f540 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47fc40 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47fd40 flags=0x0000]
[Mon Oct  8 12:22:29 2018] iommu ivhd0: AMD-Vi: Event logged [
[Mon Oct  8 12:22:29 2018] iommu ivhd0: INVALID_DEVICE_REQUEST device=27:00.0 pasid=0x00000 address=0x000000fd0b47ff40 flags=0x0000]

This caused the driver in the guest to go completely wild and reset the GPU several times. I rebooted the VM and everything is fine…

Otherwise it’s been completely stable for months now, I’ve been working on it and playing games. AC Odyssey runs great btw.

This is my VM xml, The PCI layout is probably not ideal, but it’s working so I haven’t touched it since.

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
  <name>win10_vfio</name>
  <uuid>7e3fcd41-a4a6-423d-98a2-9924b56164b6</uuid>
  <memory unit='KiB'>10485760</memory>
  <currentMemory unit='KiB'>10485760</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <vcpu placement='static'>8</vcpu>
  <iothreads>4</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='9'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <vcpupin vcpu='4' cpuset='12'/>
    <vcpupin vcpu='5' cpuset='13'/>
    <vcpupin vcpu='6' cpuset='14'/>
    <vcpupin vcpu='7' cpuset='15'/>
    <emulatorpin cpuset='6-7'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-3.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10_vfio_VARS.fd</nvram>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
  </features>
  <cpu mode='host-passthrough' check='partial'>
    <topology sockets='1' cores='4' threads='2'/>
    <feature policy='require' name='topoext'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='threads'/>
      <source dev='/dev/disk/by-id/wwn-0x500a0751e1396dd1'/>
      <target dev='sda' bus='scsi'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='threads'/>
      <source dev='/dev/disk/by-id/wwn-0x500a0751e147273c'/>
      <target dev='sdb' bus='scsi'/>
      <address type='drive' controller='1' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='sdc' bus='scsi'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/zvol/prime/virtualmachine/win_data'/>
      <target dev='sdd' bus='scsi'/>
      <address type='drive' controller='2' bus='0' target='0' unit='3'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/zvol/prime/virtualmachine/win_games'/>
      <target dev='sde' bus='scsi'/>
      <address type='drive' controller='2' bus='0' target='0' unit='4'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/zvol/prime/virtualmachine/win_share'/>
      <target dev='sdf' bus='scsi'/>
      <address type='drive' controller='2' bus='0' target='0' unit='5'/>
    </disk>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x16'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x17'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='11' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='11' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='12' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='12' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='nec-xhci'>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </controller>
    <controller type='scsi' index='0' model='virtio-scsi'>
      <driver queues='4' iothread='1'/>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </controller>
    <controller type='scsi' index='1' model='virtio-scsi'>
      <driver queues='4' iothread='1'/>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
    </controller>
    <controller type='scsi' index='2' model='virtio-scsi'>
      <driver queues='4' iothread='2'/>
      <address type='pci' domain='0x0000' bus='0x0c' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <interface type='bridge'>
      <mac address='52:54:00:f6:3e:e1'/>
      <source bridge='vt-bridge'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x27' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x27' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x1'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x29' slot='0x00' function='0x3'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </memballoon>
    <shmem name='looking-glass'>
      <model type='ivshmem-plain'/>
      <size unit='M'>32</size>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </shmem>
  </devices>
</domain>

Ley me know what you think, or if there’s something I should optimize.
Blame virt-manager for the excessive number of root ports, Several of those could probably be removed, but like I said, it’s been working so I was not quite willing to touch it.

This then (excuse the long post) are the full resulting qemu arguments:

qemu-system-x86_64 
    -name guest=win10_vfio,debug-threads=on 
    -S 
    -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10_vfio/master-key.aes 
    -machine pc-q35-3.0,accel=kvm,usb=off,vmport=off,dump-guest-core=off 
    -cpu host,topoext=on,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff,kvm=off 
    -drive file=/usr/share/ovmf/x64/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on 
    -drive file=/var/lib/libvirt/qemu/nvram/win10_vfio_VARS.fd,if=pflash,format=raw,unit=1 
    -m 10240 
    -mem-prealloc 
    -mem-path /dev/hugepages/libvirt/qemu/1-win10_vfio 
    -realtime mlock=off 
    -smp 8,sockets=1,cores=4,threads=2 
    -object iothread,id=iothread1 
    -object iothread,id=iothread2 
    -object iothread,id=iothread3 
    -object iothread,id=iothread4 
    -uuid 7e3fcd41-a4a6-423d-98a2-9924b56164b6 
    -display none 
    -no-user-config 
    -nodefaults 
    -chardev socket,id=charmonitor,fd=25,server,nowait 
    -mon chardev=charmonitor,id=monitor,mode=control 
    -rtc base=localtime,driftfix=slew 
    -global kvm-pit.lost_tick_policy=delay 
    -no-hpet 
    -no-shutdown 
    -global ICH9-LPC.disable_s3=1 
    -global ICH9-LPC.disable_s4=1 
    -boot menu=on,strict=on 
    -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 
    -device pcie-pci-bridge,id=pci.2,bus=pci.1,addr=0x0 
    -device pcie-root-port,port=0x11,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x1 
    -device pcie-root-port,port=0x12,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x2 
    -device pcie-root-port,port=0x13,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x3 
    -device pcie-root-port,port=0x14,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x4 
    -device pcie-root-port,port=0x15,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x5 
    -device pcie-root-port,port=0x16,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x6 
    -device pcie-root-port,port=0x17,chassis=9,id=pci.9,bus=pcie.0,addr=0x2.0x7 
    -device pcie-root-port,port=0x8,chassis=10,id=pci.10,bus=pcie.0,multifunction=on,addr=0x1 
    -device pcie-root-port,port=0x9,chassis=11,id=pci.11,bus=pcie.0,addr=0x1.0x1 
    -device pcie-root-port,port=0xa,chassis=12,id=pci.12,bus=pcie.0,addr=0x1.0x2 
    -device nec-usb-xhci,id=usb,bus=pci.7,addr=0x0 
    -device virtio-scsi-pci,iothread=iothread1,id=scsi0,num_queues=4,bus=pci.9,addr=0x0 
    -device virtio-scsi-pci,iothread=iothread1,id=scsi1,num_queues=4,bus=pci.10,addr=0x0 
    -device virtio-scsi-pci,iothread=iothread2,id=scsi2,num_queues=4,bus=pci.12,addr=0x0 
    -device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 
    -drive file=/dev/disk/by-id/wwn-0x500a0751e1396dd1,format=raw,if=none,id=drive-scsi0-0-0-0,cache=none,aio=threads 
    -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=1,write-cache=on 
    -drive file=/dev/disk/by-id/wwn-0x500a0751e147273c,format=raw,if=none,id=drive-scsi1-0-0-1,cache=none,aio=threads 
    -device scsi-hd,bus=scsi1.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1-0-0-1,id=scsi1-0-0-1,write-cache=on 
    -drive if=none,id=drive-scsi0-0-0-2,readonly=on 
    -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=2,drive=drive-scsi0-0-0-2,id=scsi0-0-0-2 
    -drive file=/dev/zvol/prime/virtualmachine/win_data,format=raw,if=none,id=drive-scsi2-0-0-3,cache=none,aio=native 
    -device scsi-hd,bus=scsi2.0,channel=0,scsi-id=0,lun=3,drive=drive-scsi2-0-0-3,id=scsi2-0-0-3,write-cache=on 
    -drive file=/dev/zvol/prime/virtualmachine/win_games,format=raw,if=none,id=drive-scsi2-0-0-4,cache=none,aio=native 
    -device scsi-hd,bus=scsi2.0,channel=0,scsi-id=0,lun=4,drive=drive-scsi2-0-0-4,id=scsi2-0-0-4,write-cache=on 
    -drive file=/dev/zvol/prime/virtualmachine/win_share,format=raw,if=none,id=drive-scsi2-0-0-5,cache=none,aio=native 
    -device scsi-hd,bus=scsi2.0,channel=0,scsi-id=0,lun=5,drive=drive-scsi2-0-0-5,id=scsi2-0-0-5,write-cache=on 
    -netdev tap,fd=27,id=hostnet0,vhost=on,vhostfd=28 
    -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:f6:3e:e1,bus=pci.6,addr=0x0 
    -chardev pty,id=charserial0 
    -device isa-serial,chardev=charserial0,id=serial0 
    -device vfio-pci,host=27:00.0,id=hostdev0,bus=pci.8,multifunction=on,addr=0x0 
    -device vfio-pci,host=27:00.1,id=hostdev1,bus=pci.8,addr=0x0.0x1 
    -device vfio-pci,host=29:00.3,id=hostdev2,bus=pci.11,addr=0x0 
    -device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 
    -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny 
    -object memory-backend-file,id=shmmem-shmem0,mem-path=/dev/shm/looking-glass,size=33554432,share=yes 
    -device ivshmem-plain,id=shmem0,memdev=shmmem-shmem0,bus=pci.2,addr=0x1 
    -msg timestamp=on
1 Like

Latency wise it’s an absolute necessity though to use USB 3 model for the qemu config.
Something is wrong with the USB 2 model which leads to horrific DPC latency on the guest.

1 Like

An update to this. Alex of RedHat has agreed to work with me on this and try to get a proper patch mainstreamed to at the very least correct the PCIe Gen3 link negotiation. No guarantees on the timeline at this point however due to his workload.

10 Likes

@gnif in case you guys need some help pls let me know, I’m not a specialist on that topic but I have some spare time right now

Something I noticed from @gnif’s config.
Enabling L3 cache emulation gives a rather noticeable performance improvement.

l3-cache=on, host-cache-info=off'

Or in XML

<cpu>
    ...
    <cache level='3' mode='emulate'/>
    ...
</cpu>

As well as some other things that have largely reduced latency values I see in latencymon.

<hyperv>
      ...
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <vendor_id state='on' value='KVM Hv'/>
      <frequencies state='on'/> <!-- This one only works without `ignore_msrs=1` kerenel param -->
      ...
</hyperv>

What they do is documented here
https://libvirt.org/formatdomain.html#elementsFeatures

I’ve got some testing as well

https://www.3dmark.com/compare/fs/16633964/fs/16633164/fs/16633028/fs/16619222

Test 1 (6104)

With the Guest reporting PCI-e link speed as 8x@8GT/s(3.0) and no CPU optimizations yet. I have noticed that when I start my VM automatically early at boot, the link speed get’s automatically set to 8X@8GT/s (3.0) in the Guest, potentially as the Link speed hasn’t been downscaled to 2.5GT/s yet.

I will need to test this with some setpci tricks to see what happens if I boot the VM with pci speed forced to 8GT/s on the host.

I suspect that once I test with the setpci link scaling workaround that the 100points graphics score difference will be quite noticeable.

Test 2 & Test 3 (6064 & 6060)

With L3 cache and hyperv adjustments

Test 4 ( 6022)

Without L3 cache and hyperv adjustments
Link speed reporting as [email protected]/s (1.1) in the guest, but link scaling taking place correctly on the host.

Now for CPU-z (1.86.0) tests I don’t have screenshots attached.
But with the L3 cache and hyperv tweaks I saw the single core score go from an inconsistent 398-410 to a consistent 420-430 on a 3.8GHz Ryzen 7 1700X.

4 Likes

@gnif

Suspicions confirmed, when the guest is aware that the link is running at 8x@8GT/s (3.0) the Graphics score is consistently higher.

All three tests report consistently higher GPU performance when the Guest is aware that the link speed is running at 3.0 spec 8GT/s

GPU in Guest @ 3.0

Of note is that in this configuration the Link speed in the guest permanently reporting as 3.0 (Stuck) while the host is doing the scaling anyway.
https://www.3dmark.com/compare/fs/16634804/fs/16634741/fs/16634681

GPU in Guest @ 1.1

https://www.3dmark.com/compare/fs/16633964/fs/16633164/fs/16633028

This all translates to a consistent 100points more on the Graphics Score. In 3DMark this amounts to only about 1-2FPS, but I have noticed that in Assassins Creed Odyssey it actually can lead to a lot more consistent frame times. Sometimes 4-7FPS more if the guest is aware of the correct link speed.

This is probably more on other GPU hardware

EDIT: better phrasing.

4 Likes

for me it works too with a windows and a linux(ubuntu) vm
I remember having some problems at the start though, have to take a look again

@gnif

Took me a while to see this thread. I mentioned it in the looking glass support thread the problem I had which is exactly this.

I was not using i440fx to emulate because I am dual-booting my VM if/when I need SLI. I try to keep it under Q35 as the hardware is more in line and easier to emulate. This also had that impact in looking glass: using too much GPU because of wrong pci-x speed.

What I find interesting is that it only affects the GPU. Even when booting all other devices they are always at the correct speed. When the machine starts, the GPU always go to a lower clock.

Really glad you looked into it and found the issue, you’ve got some sharp eye for troubleshooting.

Edit: Actually, after reading a little more, I was doing exactly what you started doing now: plugging the Card into the proper place. That’s when I had the issue. For me it only works when I plug into the root bus.

1 Like

Thanks for all that testing, 3dMark is expected to show a marginal increase as you have noted, but only marginal as the bus speed mainly affects texture upload performance. Since 3dMark and many titles upload their textures at load with a progress bar, it’s not included in the benchmarks and as such any differences are marginal.

However in games that are continually streaming textures into the gfx card during play instead of having loading screens will see great benefits in smoothness as you have noted. I would expect titles like Tomb Raider, Fallout, GTA and ArmA3 to see the most benefits from this.

It should also help with loading times in general, and performance of low memory cards that are sharing some system ram for texture storage.

1 Like

Which version QEMU is convinient for testing of Nasty patch?

I am working with git master.