Increasing VFIO VGA Performance

I know that Q35 works just fine for me on my Hackintosh VM, and I am using Fedora 28. My Windows VM also works fine (I think I used Q35, but I don’t remember…).

1 Like

I’m running on an i7 3930k (MOBO: Gigabyte x79-ud5) with Q35 and Windows 8.1 guest and it’s working for me.

What? Weird. Cause I can’t get it to recognize the card at all.

1 Like

Maybe you have some misconfiguration. I posted my XML file earlier in this thread. You can check that out and compare.

Keep in mind that my guest still crashes when I try to run GPU-z and nVidia control panel reports Bus: PCI Express x0 but the performance is good and everything else works.

Okay, so apparently creating a VM from an ISO allows stuff to get recognized, but creating a VM from an existing image doesn’t for Q35.

Must be some bug with virt-manager interacting with libvirt or QEMU.

Edit: Same latency issue with Superposition, but Valley runs fine for some reason in my Kubuntu VM

2 FurryJackman: PCIe transfer speed on 440FX?
https://forums.evga.com/PCIE-bandwidth-test-cuda-m1972266.aspx

Already deleted my i440fx XML for Kubuntu. But the issue with Superposition persisted so I think moving chipsets did not change anything.

Also I’m running a Kubuntu VM right now. My Windows 8.1 image is on a different installation on Fedora 27 that I have to pull up.

So I fixed the issue with GPU-z crashing, it needed kvm.ignore_msrs=1.

With my current config I have GPU-z reporting Bus Interface as PCI even after running the render test BUT, the tool @PetebLazar posted reports bandwidth of 15949 MB/s which is PCIe Gen 3 x16.


I hope this sheds some light.

1 Like

We know it’s still PCIe Gen3 speeds, the point is the Windows driver doesn’t know, and as such doesn’t poke some registers in the ASIC to optimize for performance.

I think, bidirectional speed of PCIe 3.0 16x is ~32GB/s together (2* 8GigaTransfers/s x 2B).

Now I finally understand! Thanks.

I see. Maybe there’s a similar issue with CPU clock speed since Windows reports the stock clock speeds and not the actual clock speeds.

That’s odd, cause the 3930K I assumed was PCI-E 2.0.

You are not wrong but the mobo supports PCIe 3.0

It’s based on the CPU though. I run a 3960X equivalent Xeon and I don’t get PCI-E 3.0 because the CPU doesn’t support it.

if I understand @GrandGamer correctly he meant that another processor could fix that?!?

No, he means that I shouldn’t be getting PCIe Gen 3 speeds on my GPU because the CPU doesn’t support it. But the test I run and posted earlier clearly shows Gen 3 speeds.

Ok now I get it. but I’m still a little bit confused. Your test above says it ran at 15949 MB/s, but if I’m not mistaken max bandwidth should be something around 15754 MB/s, which also should be a theoretical max not sure how close hard really gets to that limit. So either the test is a little bit off during measurements, or your hardware is faster than the theoretical maximum.

All synthetic testing is, there are many factors involved that are beyond the control of the software. Run a benchmark 10 times and every result will be different with abnormallities that are both high and low.

The biggest problem with benchmarking inside a virtual machine is the timing source, if the system’s clock is running too fast/slow all benchmark results will be skewed.

Edit: A futher look into the tool in use, the bidirectional bandiwdth is what “appears” to exceed the maximum but in reality it doesn’t. PCIe can perform duplex communications, that is, up and down transfers in the same transaction, this is what the 3rd measurement is showing. It is not exceeding the theoretical maximum.

A good example of this is as mentioned above, is the CPU clock rate being reported as 0Mhz or completely wrong. I found that the best solution to this is to force the guest VM to use the TSC clock by disabling the HPET timer and telling qemu I don’t care for being able to migrate the VM and to enable the TSC in the guest.

To force Windows to use the TSC instead of HPET or ACPI_PM you must also turn off the hypervisor flag. The combination of switches to perform this are:

-cpu host,-hypervisor,migratable=no,+invtsc

The TSC clock is the fastest for KVM as it simply is passed through to the physical hardware. HPET, ACPI_PM and PIT have to be emulated and as such are not entirely accurate, skewing benchmarks.

Note I am seeing 4GHz clocks because I set the host governor to performance as the host does other low latency demanding things, such as an Asterisk SIP server.

For a Linux guest if you do not specify migratable=no,+invtsc you will see in dmesg that the TSC was disabled as it was determined to be unstable. (clocksource tsc unstable (delta = xxxxxxxx ns))

All that said, I ran the same test in my Patched QEMU build and I am certainly seeing higher throughput then other reports here. I will do some more testing without the patch and post my findings shortly.

bw

Ok… Time to eat my words… Gen3 is certainly NOT in use without this patch!

bw_unpatched

This is very interesting, @wendell it may actually be affecting those 40g nics. 40gbit is 5GB/s, which is possible at Gen2 speeds which it seems @GrandGamer is getting with his i440 VM.

Wait, I’m on q35 :laughing: