Increasing VFIO VGA Performance

I’m using the defaults generated by virt-manager and it seems to report the correct PCIe link speed.

Capture

That said - I cannot get the VEGA to reset properly. Running 4.19 rc6 and even when doing a normal reboot it just locks up, OVMF splash never shows and one of the CPU cores just stays pegged at 100%.

What’s strange is I did have it working at one point. It would be good to know what breaks the reset on VEGA :confused: I’ll try your config changes just to see if that fixes it though…

2 Likes

2 gnif:
I think, current PCIe speed of GPU is related to GPU power management. Try run Render test of GPU-z (question mark).

Thanks for the information but it certainly was not the issue. This is an issue that others have reported, either a PCIe link speed of 0, or reporting as PCI instead of PCIe. The exact cause and resolution is as described in the original post.

Link speed for PCIe is incomplete according to https://www.linux-kvm.org/page/PCITodo
Heading, “Support for different PCI express link width/speed settings”.

The code that causes this behaviour in in Qemu is as follows:

Hrmm, I just noticed I was still not seeing PCIe 3.0 speeds, this is due to a hard coded value in QEMU, the following patch/hack changes these values but please only apply this if you are actually running at PCIe x16 3.0 or it may cause undesireable behaviour.

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6cbb8fa054..34cfa906cf 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1895,7 +1895,7 @@ static int vfio_setup_pcie_cap(VFIOPCIDevice *vdev, int pos, uint8_t size,
                                    PCI_EXP_TYPE_ENDPOINT << 4,
                                    PCI_EXP_FLAGS_TYPE);
             vfio_add_emulated_long(vdev, pos + PCI_EXP_LNKCAP,
-                                   PCI_EXP_LNK_MLW_1 | PCI_EXP_LNK_LS_25, ~0);
+                                   PCI_EXP_LNK_MLW_16 | PCI_EXP_LNK_LS_80, ~0);
             vfio_add_emulated_word(vdev, pos + PCI_EXP_LNKCTL, 0, ~0);
         }
 
diff --git a/include/hw/pci/pcie_regs.h b/include/hw/pci/pcie_regs.h
index a95522a13b..902ace0a69 100644
--- a/include/hw/pci/pcie_regs.h
+++ b/include/hw/pci/pcie_regs.h
@@ -35,9 +35,16 @@
 /* PCI_EXP_LINK{CAP, STA} */
 /* link speed */
 #define PCI_EXP_LNK_LS_25               1
+#define PCI_EXP_LNK_LS_50               2
+#define PCI_EXP_LNK_LS_80               3
 
 #define PCI_EXP_LNK_MLW_SHIFT           ctz32(PCI_EXP_LNKCAP_MLW)
-#define PCI_EXP_LNK_MLW_1               (1 << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_1               (1  << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_2               (2  << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_4               (4  << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_8               (8  << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_12              (12 << PCI_EXP_LNK_MLW_SHIFT)
+#define PCI_EXP_LNK_MLW_16              (16 << PCI_EXP_LNK_MLW_SHIFT)
 
 /* PCI_EXP_LINKCAP */
 #define PCI_EXP_LNKCAP_ASPMS_SHIFT      ctz32(PCI_EXP_LNKCAP_ASPMS)

link

3 Likes

pretty please?

1 Like

+1
Subscribed

Ok, there is still a bit more to this to make it run at these speeds even still. While GPUZ reports that it’s at 16x 3.0, it is infact lying. This can be seen by checking the NVidia Control Panel, under “Help -> System Information” the guest is still reporting PCI Express x1.

I have another patch that fixes this, after some testing I will release it for people to try out.

4 Likes

I’m doing clean installs on test systems, but I haven’t finished yet.
Yes, I haven’t been able to replicate your results yet, but on my oldest system that already had a root bridge I don’t have performance issues that I can really tell. I’ve ran across the pcie x1 report issue before, but I don’t remember what the resolution was. I remember it was something silly that didn’t trip me up for very long. That was a F25/26 system IIRC.

Apparently qemu on F28 is a bit weird compared to F27 – dwm uses 100% cpu for no apparent reason which is making it tricky to install the vfio drivers. May just boot bare metal, install vfio, and then boot back as a vm. I haven’t tripped over this before because I usually set stuff up for “duel boot” – I’m going to have to take a few months of work off and go help the qemu guys do some proper bug testing. Can’t do the linux gaming part 3 and 4 with the agesa in its current state and this newb unfriendliness – how’d they ever get their vm up and running and performing well.

That mailing list link I sent before, there are some good nuggets of info on there about pcie speeds, q35 and that sort of thing. I think we’re rediscovering some things already known, but in different contexts than gaming/graphics card passthrough. I have some reading to do.

1 Like

So I have a similar configuration to the previously posted. I am running an i7 3930k with a GTX 1080 for the guest OS. nVidia Control Panel system info reports this. GPU-z crashes my VM when I try to start it. There could be something wrong with my configuration but everything else seems to be working fine and the performance is good.

Here’s my XML file for reference.

Confirmed that this does indeed improve performance.

my patch however is a non PCI compliant hacky mess that is likely to break things. In the host I can actually watch the guest negotiate the link speed as it resets back to 2.5GT/s and then works it’s way back up to 8GT/s at boot.

I found this old patch by Alex Williamson that was rejected due to failure to handle link-renegotiation and hacked it up to get it working.

https://patchwork.ozlabs.org/patch/230993/

I also disabled the configuration space emulation completely for the VFIO device, allowing it to see the true device config space, and alter it, allowing the GPU to perform link negotiation.

This patch is nasty, use it at your own risk! I have run out of time for now to work on this, when I get some more I will try to learn how to implement this correctly.

2 Likes

I gotta get you an x299 system maybe, I think it behaves somewhat differently RE link negotiation. I say that only because I’ve got some 40g nics in vfio that cannot possibly be running at anything less than pcie3 or there would be performance issues

1 Like

Oh I am certain that it has always been running at PCIe3, but the AMDGPU & NVidia drivers program extra magic registers when it can see it’s running at PCIe3, likely adjusting latency timeouts or even configuring how it schedules data to be pushed out onto the bus.

For example:

Unfortunatly the same can not be said for VEGA10 (SOC15) on Linux, it’s not implemented yet…

2 Likes

Ohhh ok, this is making a lot more sense now. Very interesting. I also disable pcie aspm, which as part of power savings, dynamically adjusts pcie bus speed. I’ve noticed that a lot of things just say pcie 3 all the time when that’s the case.

1 Like

That makes sense, I was wondering when those would be coming along. Guess I’ll have to wait until you can make a tutorial that solves more problems than it creates.

Having just gotten Unigine Superposition working in a Kubuntu 18.04.1 VM, I tried to run 4K optimized but it was super latent. Max GPU usage was only 46% on a GTX 1080. Do I need to switch to Q35? I’ve been using i440fx because Q35 froze installation of Windows 8.1 in Fedora 27. Hugepages is on.

most likely, yes.

As stated above, i440fx doesn’t support PCIe and sees the device as connected to PCI. You’ll see significant performance increases when switching.

Also, any reason you’re using 8.1 instead of 10?

PCIe 3.0 x16 has ~16GB/s in one direction.

x:>concBandwidthTest.exe 0
Device 0 took 520.901611 ms
Average HtoD bandwidth in MB/s: 12286.389331
Device 0 took 992.121826 ms
Average DtoH bandwidth in MB/s: 6450.820687
Device 0 took 1329.665039 ms
Average bidirectional bandwidth in MB/s: 9626.484584

My result HtoD is bigger than maximum of PCIe 2.0 16x. PCIe 16x 3.0 is it?

CUDA based Tool
https://forums.evga.com/PCIE-bandwidth-test-cuda-m1972266.aspx

I’m using Windows 8.1 with q35 with no issues. Windows 10 is more of a pain and I couldn’t get AMD drivers to install properly back when I was using my RX 460 like a year ago.

Shifting to Q35, and even with a proper ROM image, it refuses to initiate video. I’m back to square one cause Fedora 27+ hates Q35 for some reason. i440fx likes the card just fine and on my Fedora 27 and GPU-Z in Windows 8.1 and nvidia-settings in the Kubuntu 18.04 VM both show PCI-E 3.0 x16. Even though the root is 2.0 x16.

I had no performance issues in Windows 8.1, but only just started to get performance issues in Kubuntu 18.04 and Superposition on i440fx.

I’m stumped. QXL doesn’t work but VirtIo does if I use spice on Q35. No matter what I did, nothing brings up video on the card on Q35.

I basically before just said use i440fx and move on, but this brings up a legitimate case to use Q35 and it doesn’t work on Fedora 27+. I think it’s just my luck cause I never ever got Q35 working except on Fedora 26, which is ancient history now.

Is it because I’m running a PCI-E 2.0 interface on the host? My NIC and USB 2.0 controller on the chipset passes through fine, so why would only the GPU not work?

@wendell, are you able to test both a Sandy Bridge-E and an Ivy Bridge-E on a X79 board (with PCI-E 3.0 support) with Q35? I just want to confirm that if the Sandy Bridge-E one fails and the ivy Bridge-E one succeeds, that the PCI-E version is to blame for these Q35 errors.

If this is indeed the case, I have to buy a ivy Bridge-EP E5-1680 V2. I’ll lose some single threaded performance, but I’ll have more overclockable cores.