[i440FX or Q35 chipset for GPU passthru?] PCIe x16 3.0 on i440FX chipset? Is this right?

8-P · March 28, 2020, 3:04am

I decided to reinstall my GPU pass-thru VM because I made the setup mistake of selecting the i440FX chipset. Later I ran across posts here that mention i440FX is for PCI and Q35 is the right choice for PCIe support and required for PCIe gen3 support, translating to better GPU passthru.

Funny thing is I happened to run GPU-z on the i440FX VM and it’s reporting full PCIe x16 3.0 support.

Is my understanding correct that i440FX only supports PCI or is GPU-z reporting incorrect information?

Larger question, which is the recommended vm chipset choice for GPU-passthrough in 2020?

rexar · March 29, 2020, 1:34am

Anecdote time, I was in the same boat when I reinstalled a year and a bit ago.

TL;DR

I could not get AMD drivers beyond Catalyst 18.5.1 working on i440fx based Windows 10 VMs, but worked fine under Q35.
Watch out for new installs if your Q35 machine type is version ‘4.0’ and running Nvidia, use <ioapic driver='kvm'/> if you’re stuck, but this should be resolved in recent point-releases of QEMU.
There’s evidence to suggest that drivers program the GPU differently if the card is detected as a legacy endpoint (PCI/i440fx) or a native PCIe endpoint (Q35).
You may need to manually define the PCIe link speed and width in your libvirt definition or QEMU command line if you’re getting undesirable results on Q35. This tweak is a new feature in QEMU 4.0+.

In my circumstance back then I was still passing through an R9 290 to the Windows 10 VM and had been finding that drivers beyond Catalyst 18.5.1 would no longer play nice with the ioh3420 bridge to get them installed. In test VMs I’d discovered that later drivers worked fine with Q35 so that cemented the decision to switch the next time I was going to reinstall the VM.

Fast forward a few months later (when QEMU 4.0 came out) I acquired a GTX 1080 and decided to reinstall, sticking to the Q35 decision despite no longer having the affected card. I promptly came across this bug:

https://bugs.launchpad.net/qemu/+bug/1826422

This should be rectified now for new installs but something to keep in mind if you have an older distro.

Simultaneously on finding that bugzilla report I’d found @gnif’s thread here:

Increasing VFIO VGA Performance Linux

Hi All, Last month I was spending considerable time pouring through the amdgpu sources in efforts to make it possible to reset the VEGA10 series of cards. While doing this I noted that the amdgpu driver checks to see if the GPU is connected via PCIe Gen3, and if so, it programs some registers in the GPU. This got me wondering if NVidia do the same, and since Windows doesn’t see the passthrough card as anything other then standard PCI, not even PCIe, is it suffering for it? After a few days of hacking on qemu and learning more about how PCIe works I have found both why Windows sees the device as a legacy PCI device and how to fix it. Firstly i440fx doesn’t support PCIe at all, the card is presented to the guest as a PCI device as there is no other option, this is kinda obvious so if you plan to try this out be aware that you will need to switch to the q35 platform. Secondly q35 does support PCIe, but most if not all of us are simply connecting the device to the root bus directly. When this is done, Qemu changes the emulated PCI configuration space, setting the device to report it’s type as an Integrated Endpoint. In the physical world this means the device would be physicially integrated into the PCIe controller, not on a PCIe bus. Because of this it is invalid to provide any link speed configuration, and as such Qemu omits it. The fix is simple, add a PCI Express Root bus device to the configuration and plug the video card into it instead. Here is how I accomplished this. -device ioh3420,id=root_port1,chassis=0,slot=0,bus=pcie.0 \ -device vfio-pci,host=$NVIDIA.0,id=hostdev1,bus=root_port1,addr=0x00,multifunction=on,romfile=/opt/VM/Windows/1080Ti.rom \ -device vfio-pci,host=$NVIDIA.1,id=hostdev2,bus=root_port1,addr=0x00.1 \ The difference this made is enormous and I am now getting bare metal performance out of my GPU in windows. GPU-Z now reports the GPU is on a PCIe link, as does the NVidia system information. I have also noted that LatencyMon reports …

Post 125 on wards has more relevant info with regards to QEMU 4.0.

As an aside for this, the benchmark I’ve been using is the CUDA-based concBandwidthTest tool to verify PCIe link speeds which is linked in the thread from @PetebLazar

https://forums.evga.com/PCIE-bandwidth-test-cuda-m1972266.aspx

What I’d found at the time was that ‘out of the box’ my QEMU 4.0 PCIe negotiated link speed was woeful, equivalent to Gen 3 x1. Applying the XML changes in the thread to force it to Gen 3 x8 which is what it is in hardware (motherboard split x8 x8) brought it back up to expected levels. This was crossed checked with a bare-metal Windows install of the GTX 1080 in both x8 and x16 configurations. While not identical, it was in the order of a few hundred MB/s either side rather than GB/s off. The tool gives noisy results, but it is a general indicator. As a yard stick, Gen 3 x8 for my GTX 1080 is ~10-11GB/s bidirectional bandwidth as reported by the tool.

In effect, i440fx, bare metal and Q35 were close-enough in measured bandwidth for me to not look into it any deeper given that Q35 has been otherwise trouble free for me.

A note on GPU-Z, the Bus Interface tab isn’t always correct. If you have an Nvidia card check the Nvidia control panel, in particular the System Information section and scroll to the bottom to verify bus parameters.

From my perspective, I’ll be sticking with Q35 form now on because of driver compatibility and that I’ve not seen any regressions compared to i440fx post-link negotiation fix.

8-P · April 5, 2020, 11:18pm

Thanks for the super-helpful post.

Oddly, for me even though in this QEMU 4.0/Q35 vm passing through a GTX 1080 where nvidia system information reports ‘PCI Express x16 Gen3’, it appears performance has regressed from the i440FX vm I was previously using.

I thought the Q35 PCIe bridges looked right at the time I built the vm but I’m going to closely re-read the posts you referenced and double-check that. (thanks again for those links)

I do a software development that touches DX-12 (unrealengine, etc) and work in CAD apps and for this use Q35 performance seems exactly the same as i440FX did (quite good UX), but this weekend I installed a few video games and am getting really bad tearing in the video output window even on low-end DX9 games like halflife-2. I don’t recall seeing this problem back when I did some early tests many months ago, but this machine has seen a few configuration changes since then that might be an issue. I may have to dig that vm out of the backups to double-check. Or somehow figure out a way to pin the issue down. (?)

This config is using a 64M shm buffer because this dell U3514W runs at 3540x1440. I wonder if this could this be a bus/ram bandwidth issue? Not sure if the client -> host transfer is compressed and what the exact bandwidth requirements are vs. DDR4-3200 with currently pretty lazy XMP timings. @gnif, can you share any insight here? Can you recommend a QEMU4/Q35 PCIe bus configuration for new installs? Does virt-manager generate a good config?

Update: “concBandwidthTest.exe 0” returns:

Device 0 took 696.224792 ms
Average HtoD bandwidth in MB/s: 9192.433348
Device 0 took 721.469543 ms
Average DtoH bandwidth in MB/s: 8870.783331
Device 0 took 1079.758789 ms
Average bidirectional bandwidth in MB/s: 11854.499477

Which looks close to what PCIe 3.0 x16 should be delivering bare-metal. Sounds like it’s not a Q35 issue after all. Might have to start a new thread if google doesn’t turn up something soon.