Increasing VFIO VGA Performance

gnif · October 6, 2018, 6:18am

Hi All,

Last month I was spending considerable time pouring through the amdgpu sources in efforts to make it possible to reset the VEGA10 series of cards. While doing this I noted that the amdgpu driver checks to see if the GPU is connected via PCIe Gen3, and if so, it programs some registers in the GPU. This got me wondering if NVidia do the same, and since Windows doesn’t see the passthrough card as anything other then standard PCI, not even PCIe, is it suffering for it?

After a few days of hacking on qemu and learning more about how PCIe works I have found both why Windows sees the device as a legacy PCI device and how to fix it.

Firstly i440fx doesn’t support PCIe at all, the card is presented to the guest as a PCI device as there is no other option, this is kinda obvious so if you plan to try this out be aware that you will need to switch to the q35 platform.

Secondly q35 does support PCIe, but most if not all of us are simply connecting the device to the root bus directly. When this is done, Qemu changes the emulated PCI configuration space, setting the device to report it’s type as an Integrated Endpoint. In the physical world this means the device would be physicially integrated into the PCIe controller, not on a PCIe bus. Because of this it is invalid to provide any link speed configuration, and as such Qemu omits it.

The fix is simple, add a PCI Express Root bus device to the configuration and plug the video card into it instead. Here is how I accomplished this.

 -device ioh3420,id=root_port1,chassis=0,slot=0,bus=pcie.0 \ 
 -device vfio-pci,host=$NVIDIA.0,id=hostdev1,bus=root_port1,addr=0x00,multifunction=on,romfile=/opt/VM/Windows/1080Ti.rom \
 -device vfio-pci,host=$NVIDIA.1,id=hostdev2,bus=root_port1,addr=0x00.1 \

The difference this made is enormous and I am now getting bare metal performance out of my GPU in windows. GPU-Z now reports the GPU is on a PCIe link, as does the NVidia system information.

I have also noted that LatencyMon reports on average much lower latency then previous, where I was seeing up to 2000us, I now never see above 200us.

Dje4321 · October 6, 2018, 6:23am

Amazing work as always. Im gonna have to see about sending some money your way. Got anything where i can donate directly instead of patreon?

SgtAwesomesauce · October 6, 2018, 6:24am

This is awesome!

I think I was always missing the bus=root_port1 component of adding the ioh3420 component. All the guides I’ve ever seen never attached them to the root device, they attached them to pcie.0 and included the ioh3420 at the same logical level.

I’m starting to get the passthrough itch again, dammit!

SgtAwesomesauce · October 6, 2018, 6:32am

Till Monday

gnif · October 6, 2018, 6:38am

Thanks mate, I have just setup a paypal donation link for those that do not wish to donate via Patreon. Your support is much appreciated.

https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=ESQ72XUPGKXRY

Edit: this link has been added to the LookingGlass website

gnif · October 6, 2018, 6:38am

Is that a bad thing? lol.

SgtAwesomesauce · October 6, 2018, 6:39am

It’s not so much a bad thing, but my life is pretty busy right now. Not sure I’d have time to relax.

gnif · October 6, 2018, 1:44pm

Umm, wow, this is new too. After updating my Linux workstation VM to use this same method, my VEGA10 resets properly! Need to perform further testing, but it seems to have fixed the problem!

@wendell can you please test this also and confirm?

Here is the lspci output in my Linux VM

00:00.0 Host bridge: Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller
00:01.0 SCSI storage controller: Red Hat, Inc Virtio SCSI
00:02.0 Ethernet controller: Red Hat, Inc Virtio network device
00:03.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
00:04.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
00:05.0 PCI bridge: Intel Corporation 7500/5520/5500/X58 I/O Hub PCI Express Root Port 0 (rev 02)
00:06.0 RAM memory: Red Hat, Inc Inter-VM shared memory (rev 01)
00:1f.0 ISA bridge: Intel Corporation 82801IB (ICH9) LPC Interface Controller (rev 02)
00:1f.2 SATA controller: Advanced Micro Devices, Inc. [AMD] Device 43c8 (rev 02)
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XT [Radeon RX Vega 64] (rev c3)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf8
02:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a
02:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor
02:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller
03:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device 145a
03:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor
03:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller

wendell · October 6, 2018, 2:49pm

As far as Vega reset goes, this explains why it almost always worked for me, as I usually used the q35 chipset. I lost that accidental discovery when switching around hardware and was puzzled I too was now having the Vega reset problem

gnif · October 6, 2018, 2:51pm

I have always had it with q35, but putting the vega onto a root bus seems to have fixed it. Initial testing on the VEGA also show a very substantial performance increase now it can see it’s on a Gen3 bus.

geoff@aeryn:~$ sudo lspci -s 01:00.0 -vv | grep LnkSta:
		LnkSta:	Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

wendell · October 6, 2018, 2:53pm

I must have done that accidentally during my initial setup because it wasn’t fine after everyone was like “wait it works for you?” Not sure checking now

garyp · October 6, 2018, 3:25pm

Is there a way to do this via virt-manager or virsh edit [name]?

I currently do GPU passthrough by adding the gpu/audio via virt-manager which produces the following in the virsh edit xml file:

  <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x1d' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x1d' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </hostdev>

Can anything extra be specified to move this onto a pcie root?

I think your method could be replicated via qemu:commandline but I assume I’d lose the automatic rebinding virt-manager does? Currently it will unbind from amdgpu, bind to vfio-pci, run the vm then reverse the process on shutdown so I can continue to use the GPU in the host via DRI_PRIME.

wendell · October 6, 2018, 3:30pm

this way
https://www.redhat.com/archives/vfio-users/2016-January/msg00301.html

heres what I added. Note that multifunction and getting function id right (0 for gfx 1 for audio) seems to be important, still testing.

Controller section (I used bus id 8, you should use whatver makes sense for your system)

 <controller type='pci' index='8' model='pcie-root-port'>
      <model name='ioh3420'/>
      <target chassis='1' port='0x1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1c' function='0x0' multifunction='on'/>
    </controller>

and

 <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0b' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0' multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0b' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x1'/>
    </hostdev>

still testing, so I might edit this more.

garyp · October 6, 2018, 5:04pm

If “index” in the controller definition matches the “bus” used in the hostdev, then it looks like by default virt-manager defines a separate pcie-root-port device for gpu and gpu-audio pci devices rather than assigning both to a single pcie-root-port with function 0x0 and 0x1 accordingly.

I’ve tried with this default and also with a config similar to yours above which places the gpu-audio and gpu on the same pcie-root-port.

Two gaming benchmarks and a run of the superposition benchmark showed no real difference between the two. GPU-Z reports in both cases that the gpu is on a pcie3 port running at “pci-express x8 v1.1”.

So unless I’ve missed something, it may be that out the box virt-mananger is already setting up the pci passthrough for the gpu in a suitable way (albeit with audio and gpu on different root ports, which may not be desirable?)

Still, not in vain, learned more about the xml config format

Benchmark wise if anyone happens to have similar hw, 8GB RX580 Nitro+, 2700X and 16GB 3200 ram is giving me 70/84/107 min/avg/max in F1 2017 with ultra high setting and both AA options maxed). Superposition (1080p extreme) gives a score of 2647 utilising 100% of gpu. Be interested in seeing how that compares to others doing passthrough if you have similar hw.

wendell · October 6, 2018, 5:07pm

I am still having reset issues, I think, if I unclean shutdown the guest. clean shutdown is still ok?

gnif · October 6, 2018, 5:33pm

I am yet to try an unclean shutdown, but for me even a clean shutdown wasn’t working at all before these changes.

wendell · October 6, 2018, 5:46pm

My lab asplode but testing right now
.may do a live stream for this later. I’ve got tons of friends hardware right now because everyone is upgrading and letting me test their stuff

PetebLazar · October 6, 2018, 7:01pm

 -M q35,accel=kvm \
 -smp 32,cores=32,threads=1,sockets=1 \
 -device ioh3420,bus=pcie.0,addr=1c.0,multifunction=on,port=1,chassis=1,id=root.1 \
 -device vfio-pci,host=42:00.0,bus=root.1,addr=00.0,multifunction=on,romfile=/home/pburic/GTX1080Ti_patched.rom \
 -device vfio-pci,host=42:00.1,bus=root.1,addr=00.1 \

I’m using that from day one (11/2017 … Ugly patch), it looks same.
Or is there difference?

This setup is mentioned in most HowTos (very old one).
https://wiki.debian.org/VGAPassthrough
https://gist.github.com/ArseniyShestakov/2891f5c147298e0b8ffa#file-qemu-cmd

gnif · October 6, 2018, 7:04pm

Looks fine, a few differences though.

x-vga doesn’t need to be specified anymore.
multifunction=on doesn’t make any sense on the root port if you are only passing a single device through.
multifunction=on only makes sense on the actual device if you are passing through a device with children, ie HDMI audio.

Other then that, if GPU-Z or lspci reports the video card has a link speed other then just “PCI” you’re already running as you should be.

That example is actually incomplete though, it’s skipping a device which can cause NVidia cards to have a fit (although I can see you have included it anyway), it should be:

 -device ioh3420,bus=pcie.0,addr=1c.0,port=1,chassis=1,id=root.1 \
 -device vfio-pci,host=01:00.0,bus=root.1,addr=00.0,multifunction=on \
 -device vfio-pci,host=01:00.1,bus=root.1,addr=00.1

gnif · October 6, 2018, 7:27pm

For reference here is my complete qemu launch args.

/usr/local/bin/qemu-system-x86_64 \
  -nographic \
  -machine q35,accel=kvm,usb=off,vmport=off,dump-guest-core=off,kernel-irqchip=on \
  -cpu host,hv_time,hv_vpindex,hv_reset,hv_runtime,hv_crash,hv_synic,hv_stimer,hv_spinlocks=0x1fff,hv_vendor_id=lakeuv283713,kvm=off,l3-cache=on,-hypervisor,migratable=no,+invtsc \
  -drive file=/opt/VM/Windows/ovmf/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on \
  -drive file=/opt/VM/Windows/ovmf/vars.fd,if=pflash,format=raw,unit=1 \
  -realtime mlock=on \
  -pidfile /var/run/vm.Windows.pid \
  -monitor stdio \
  -runas geoff \
  -enable-kvm \
  -name guest=Windows,debug-threads=on \
  -smp 8,sockets=1,cores=4,threads=2 \
  -m 16384 \
  -mem-prealloc \
  -global ICH9-LPC.disable_s3=1 \
  -global ICH9-LPC.disable_s4=1 \
  -no-user-config \
  -nodefaults \
  -rtc base=localtime,driftfix=slew \
  -global kvm-pit.lost_tick_policy=discard \
  -boot strict=on \
  -no-hpet \
  -netdev tap,script=/opt/VM/bin/ovs-ifup,downscript=/opt/VM/bin/ovs-ifdown,ifname=windows.30,id=hostnet0,vhost=on \
  -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:06:99:25,bus=pcie.0 \
  -soundhw ac97 \
  -device ioh3420,id=root_port1,chassis=0,slot=0,bus=pcie.0 \
  -device vfio-pci,host=45:00.0,id=hostdev1,bus=root_port1,addr=0x00,multifunction=on,romfile=/opt/VM/Windows/1080Ti.rom \
  -device vfio-pci,host=45:00.1,id=hostdev2,bus=root_port1,addr=0x00.1 \
  -drive  id=disk,file=/dev/disk/by-id/md-uuid-18bc7433:b223acbf:a1cbb062:f0b030c9,format=raw,if=none,cache=none,aio=native,discard=unmap,detect-zeroes=unmap,copy-on-read=on \
  -device virtio-scsi-pci,id=scsi \
  -device scsi-hd,drive=disk,bus=scsi.0,rotation_rate=1 \
  -device ivshmem-plain,memdev=ivshmem,bus=pcie.0 \
  -object memory-backend-file,id=ivshmem,share=on,mem-path=/dev/shm/looking-glass,size=128M \
  -spice disable-ticketing,seamless-migration=off,port=5900,addr=192.168.10.50 \
  -device virtio-keyboard-pci,id=input2,bus=pcie.0,addr=0xc

I also run a script that pins the CPU threads to the correct cores and ensures all memory accesses are local to the CPU it is pinned to. For Ryzen this isn’t so critical but for ThreadRipper it is critical if you wish to obtain the same results I have as shown below.