Increasing VFIO VGA Performance

this way
https://www.redhat.com/archives/vfio-users/2016-January/msg00301.html

heres what I added. Note that multifunction and getting function id right (0 for gfx 1 for audio) seems to be important, still testing.

Controller section (I used bus id 8, you should use whatver makes sense for your system)

 <controller type='pci' index='8' model='pcie-root-port'>
      <model name='ioh3420'/>
      <target chassis='1' port='0x1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' function='0x0' multifunction='on'/>
    </controller>

and

 <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0b' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x1'/>
    </hostdev>

still testing, so I might edit this more.

2 Likes

If “index” in the controller definition matches the “bus” used in the hostdev, then it looks like by default virt-manager defines a separate pcie-root-port device for gpu and gpu-audio pci devices rather than assigning both to a single pcie-root-port with function 0x0 and 0x1 accordingly.

I’ve tried with this default and also with a config similar to yours above which places the gpu-audio and gpu on the same pcie-root-port.

Two gaming benchmarks and a run of the superposition benchmark showed no real difference between the two. GPU-Z reports in both cases that the gpu is on a pcie3 port running at “pci-express x8 v1.1”.

So unless I’ve missed something, it may be that out the box virt-mananger is already setting up the pci passthrough for the gpu in a suitable way (albeit with audio and gpu on different root ports, which may not be desirable?)

Still, not in vain, learned more about the xml config format :stuck_out_tongue:

Benchmark wise if anyone happens to have similar hw, 8GB RX580 Nitro+, 2700X and 16GB 3200 ram is giving me 70/84/107 min/avg/max in F1 2017 with ultra high setting and both AA options maxed). Superposition (1080p extreme) gives a score of 2647 utilising 100% of gpu. Be interested in seeing how that compares to others doing passthrough if you have similar hw.

2 Likes

I am still having reset issues, I think, if I unclean shutdown the guest. clean shutdown is still ok?

I am yet to try an unclean shutdown, but for me even a clean shutdown wasn’t working at all before these changes.

My lab asplode but testing right now
.may do a live stream for this later. I’ve got tons of friends hardware right now because everyone is upgrading and letting me test their stuff

2 Likes
 -M q35,accel=kvm \
 -smp 32,cores=32,threads=1,sockets=1 \
 -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
 -device vfio-pci,host=42:00.0,bus=root.1,addr=00.0,multifunction=on,romfile=/home/pburic/GTX1080Ti_patched.rom \
 -device vfio-pci,host=42:00.1,bus=root.1,addr=00.1 \

I’m using that from day one (11/2017 … Ugly patch), it looks same.
Or is there difference?

This setup is mentioned in most HowTos (very old one).
https://wiki.debian.org/VGAPassthrough
https://gist.github.com/ArseniyShestakov/2891f5c147298e0b8ffa#file-qemu-cmd

Looks fine, a few differences though.

x-vga doesn’t need to be specified anymore.
multifunction=on doesn’t make any sense on the root port if you are only passing a single device through.
multifunction=on only makes sense on the actual device if you are passing through a device with children, ie HDMI audio.

Other then that, if GPU-Z or lspci reports the video card has a link speed other then just “PCI” you’re already running as you should be.

That example is actually incomplete though, it’s skipping a device which can cause NVidia cards to have a fit (although I can see you have included it anyway), it should be:

 -device ioh3420,bus=pcie.0,addr=1c.0,port=1,chassis=1,id=root.1 \
 -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on \
 -device vfio-pci,host=01:00.1,bus=root.1,addr=00.1

For reference here is my complete qemu launch args.

/usr/local/bin/qemu-system-x86_64 \
  -nographic \
  -machine q35,accel=kvm,usb=off,vmport=off,dump-guest-core=off,kernel-irqchip=on \
  -cpu host,hv_time,hv_vpindex,hv_reset,hv_runtime,hv_crash,hv_synic,hv_stimer,hv_spinlocks=0x1fff,hv_vendor_id=lakeuv283713,kvm=off,l3-cache=on,-hypervisor,migratable=no,+invtsc \
  -drive file=/opt/VM/Windows/ovmf/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on \
  -drive file=/opt/VM/Windows/ovmf/vars.fd,if=pflash,format=raw,unit=1 \
  -realtime mlock=on \
  -pidfile /var/run/vm.Windows.pid \
  -monitor stdio \
  -runas geoff \
  -enable-kvm \
  -name guest=Windows,debug-threads=on \
  -smp 8,sockets=1,cores=4,threads=2 \
  -m 16384 \
  -mem-prealloc \
  -global ICH9-LPC.disable_s3=1 \
  -global ICH9-LPC.disable_s4=1 \
  -no-user-config \
  -nodefaults \
  -rtc base=localtime,driftfix=slew \
  -global kvm-pit.lost_tick_policy=discard \
  -boot strict=on \
  -no-hpet \
  -netdev tap,script=/opt/VM/bin/ovs-ifup,downscript=/opt/VM/bin/ovs-ifdown,ifname=windows.30,id=hostnet0,vhost=on \
  -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:06:99:25,bus=pcie.0 \
  -soundhw ac97 \
  -device ioh3420,id=root_port1,chassis=0,slot=0,bus=pcie.0 \
  -device vfio-pci,host=45:00.0,id=hostdev1,bus=root_port1,addr=0x00,multifunction=on,romfile=/opt/VM/Windows/1080Ti.rom \
  -device vfio-pci,host=45:00.1,id=hostdev2,bus=root_port1,addr=0x00.1 \
  -drive  id=disk,file=/dev/disk/by-id/md-uuid-18bc7433:b223acbf:a1cbb062:f0b030c9,format=raw,if=none,cache=none,aio=native,discard=unmap,detect-zeroes=unmap,copy-on-read=on \
  -device virtio-scsi-pci,id=scsi \
  -device scsi-hd,drive=disk,bus=scsi.0,rotation_rate=1 \
  -device ivshmem-plain,memdev=ivshmem,bus=pcie.0 \
  -object memory-backend-file,id=ivshmem,share=on,mem-path=/dev/shm/looking-glass,size=128M \
  -spice disable-ticketing,seamless-migration=off,port=5900,addr=192.168.10.50 \
  -device virtio-keyboard-pci,id=input2,bus=pcie.0,addr=0xc

I also run a script that pins the CPU threads to the correct cores and ensures all memory accesses are local to the CPU it is pinned to. For Ryzen this isn’t so critical but for ThreadRipper it is critical if you wish to obtain the same results I have as shown below.

3 Likes

I’m using the defaults generated by virt-manager and it seems to report the correct PCIe link speed.

Capture

That said - I cannot get the VEGA to reset properly. Running 4.19 rc6 and even when doing a normal reboot it just locks up, OVMF splash never shows and one of the CPU cores just stays pegged at 100%.

What’s strange is I did have it working at one point. It would be good to know what breaks the reset on VEGA :confused: I’ll try your config changes just to see if that fixes it though…

2 Likes

2 gnif:
I think, current PCIe speed of GPU is related to GPU power management. Try run Render test of GPU-z (question mark).

Thanks for the information but it certainly was not the issue. This is an issue that others have reported, either a PCIe link speed of 0, or reporting as PCI instead of PCIe. The exact cause and resolution is as described in the original post.

Link speed for PCIe is incomplete according to https://www.linux-kvm.org/page/PCITodo
Heading, “Support for different PCI express link width/speed settings”.

The code that causes this behaviour in in Qemu is as follows:

Hrmm, I just noticed I was still not seeing PCIe 3.0 speeds, this is due to a hard coded value in QEMU, the following patch/hack changes these values but please only apply this if you are actually running at PCIe x16 3.0 or it may cause undesireable behaviour.

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6cbb8fa054..34cfa906cf 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1895,7 +1895,7 @@ static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size,
                                    PCI_EXP_TYPE_ENDPOINT << 4,
                                    PCI_EXP_FLAGS_TYPE);
             vfio_add_emulated_long(vdev, pos + PCI_EXP_LNKCAP,
-                                   PCI_EXP_LNK_MLW_1 | PCI_EXP_LNK_LS_25, ~0);
+                                   PCI_EXP_LNK_MLW_16 | PCI_EXP_LNK_LS_80, ~0);
             vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKCTL, 0, ~0);
         }
 
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index a95522a13b..902ace0a69 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -35,9 +35,16 @@
 /* PCI_EXP_LINK{CAP, STA} */
 /* link speed */
 #define PCI_EXP_LNK_LS_25               1
+#define PCI_EXP_LNK_LS_50               2
+#define PCI_EXP_LNK_LS_80               3
 
 #define PCI_EXP_LNK_MLW_SHIFT           ctz32(PCI_EXP_LNKCAP_MLW)
-#define PCI_EXP_LNK_MLW_1               (1 << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_1               (1  << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_2               (2  << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_4               (4  << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_8               (8  << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_12              (12 << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_16              (16 << PCI_EXP_LNK_MLW_SHIFT)
 
 /* PCI_EXP_LINKCAP */
 #define PCI_EXP_LNKCAP_ASPMS_SHIFT      ctz32(PCI_EXP_LNKCAP_ASPMS)

link

3 Likes

pretty please?

1 Like

+1
Subscribed

Ok, there is still a bit more to this to make it run at these speeds even still. While GPUZ reports that it’s at 16x 3.0, it is infact lying. This can be seen by checking the NVidia Control Panel, under “Help -> System Information” the guest is still reporting PCI Express x1.

I have another patch that fixes this, after some testing I will release it for people to try out.

4 Likes

I’m doing clean installs on test systems, but I haven’t finished yet.
Yes, I haven’t been able to replicate your results yet, but on my oldest system that already had a root bridge I don’t have performance issues that I can really tell. I’ve ran across the pcie x1 report issue before, but I don’t remember what the resolution was. I remember it was something silly that didn’t trip me up for very long. That was a F25/26 system IIRC.

Apparently qemu on F28 is a bit weird compared to F27 – dwm uses 100% cpu for no apparent reason which is making it tricky to install the vfio drivers. May just boot bare metal, install vfio, and then boot back as a vm. I haven’t tripped over this before because I usually set stuff up for “duel boot” – I’m going to have to take a few months of work off and go help the qemu guys do some proper bug testing. Can’t do the linux gaming part 3 and 4 with the agesa in its current state and this newb unfriendliness – how’d they ever get their vm up and running and performing well.

That mailing list link I sent before, there are some good nuggets of info on there about pcie speeds, q35 and that sort of thing. I think we’re rediscovering some things already known, but in different contexts than gaming/graphics card passthrough. I have some reading to do.

1 Like

So I have a similar configuration to the previously posted. I am running an i7 3930k with a GTX 1080 for the guest OS. nVidia Control Panel system info reports this. GPU-z crashes my VM when I try to start it. There could be something wrong with my configuration but everything else seems to be working fine and the performance is good.

Here’s my XML file for reference.

Confirmed that this does indeed improve performance.

my patch however is a non PCI compliant hacky mess that is likely to break things. In the host I can actually watch the guest negotiate the link speed as it resets back to 2.5GT/s and then works it’s way back up to 8GT/s at boot.

I found this old patch by Alex Williamson that was rejected due to failure to handle link-renegotiation and hacked it up to get it working.

https://patchwork.ozlabs.org/patch/230993/

I also disabled the configuration space emulation completely for the VFIO device, allowing it to see the true device config space, and alter it, allowing the GPU to perform link negotiation.

This patch is nasty, use it at your own risk! I have run out of time for now to work on this, when I get some more I will try to learn how to implement this correctly.

2 Likes

I gotta get you an x299 system maybe, I think it behaves somewhat differently RE link negotiation. I say that only because I’ve got some 40g nics in vfio that cannot possibly be running at anything less than pcie3 or there would be performance issues

1 Like

Oh I am certain that it has always been running at PCIe3, but the AMDGPU & NVidia drivers program extra magic registers when it can see it’s running at PCIe3, likely adjusting latency timeouts or even configuring how it schedules data to be pushed out onto the bus.

For example:

Unfortunatly the same can not be said for VEGA10 (SOC15) on Linux, it’s not implemented yet…

2 Likes