VFIO Poor DirectX Performance (But OpenGL is great)

I have been pulling my hair out as to why I am getting random stutters in Rocket League. I have tried every config option I can find to no real solution.

Basically, while using OpenGL life is good. If a game requires DirectX, life is hell. When using DirectX, including benchmarks, GPU performance struggles to go over 30% while frame limited and stuttering. CPU usage looks great, no peaked cores, all around 30%.

The system is a Ryzen 5900X and a Radeon 6900XT on a ASUS X570-ACE motherboard. I’m passing through my primary GPU, the onboard Intel network card, an onboard USB hub and an NVMe drive. I am using SSH to manage the host. The plan here is to have multiple VMs on a headless hypervisor.

Here is my QEMU process line (sorry, i’m not a libvirt guy, I like to know the inner workings of everything)

qemu-system-x86_64 -overcommit cpu-pm=on -global pcie-root-port.x-speed=16 -global pcie-root-port.x-width=32 -name win10,process=win10,debug-threads=on -nodefaults -machine type=pc,accel=kvm,hpet=off -cpu host,+topoext,+aes,l3-cache=on,-svm,-amd-stibp,host-cache-info=on,kvm=off,hv_vendor_id=NvidiaSucks,hv-avic=on,hv_time,hv_spinlocks=0x1fff,hv_vapic,hv_reset,x2apic=off -smp 12,sockets=1,cores=6,threads=2 -global kvm-pit.lost_tick_policy=discard -m 24G -mem-path /dev/hugepages -net none -rtc clock=host,base=localtime -serial none -parallel none -device ioh3420,id=root_port1,chassis=0,slot=0,bus=pci.0 -device vfio-pci,host=0c:00.0,bus=root_port1,addr=00.0,multifunction=on -device vfio-pci,host=0c:00.1,bus=root_port1,addr=00.1 -device ioh3420,id=root_port2,chassis=0,slot=1,bus=pci.0 -device vfio-pci,host=0000:05:00.0,bus=root_port2,addr=00.0 -device ioh3420,id=root_port3,chassis=0,slot=2,bus=pci.0 -device vfio-pci,host=0000:07:00.0,bus=root_port3,addr=00.0,multifunction=on -device vfio-pci,host=0000:07:00.1,bus=root_port3,addr=00.1 -device vfio-pci,host=0000:07:00.3,bus=root_port3,addr=00.3 -device ioh3420,id=root_port4,chassis=0,slot=3,bus=pci.0 -device vfio-pci,host=0000:01:00.0,bus=root_port4,addr=00.0 -drive if=pflash,format=raw,readonly=on,file=/usr/share/edk2/x64/OVMF_CODE.fd -drive if=pflash,format=raw,file=/root/.config/vms/win10/ovmf_vars.fd -monitor unix:/tmp/monitor_win10.sock,server,nowait -vga none -nographic -boot order=dc

1G hugepages are enabled and in-use. CPU cores are pinned correctly.

The VM is running Windows 10 22H2 with the latest directx installed using the official web installer and the latest driver from AMD 23.10.2

Currently, the issue is perfectly represented in Superposition Benchmark in 1080p medium.
DirectX: Scene 1 gives 39fps, Scene 2 gives 21fps
OpenGL: Scene 1 gives 180fps, Scene 2 gives 160fps

I get the exact same fps if I use 720p preset or 8k preset despite the rendered graphics being completely different.

Booted into bare metal with no changes & automatic driver downloads disabled. DirectX and OpenGL have almost identical performance at ~170fps.

So something is wrong with the way I am passing through the GPU via IOMMU and it is only effecting DirectX. My voyage through the vast internet came across a lot of of people complaining about 40% performance of bare metal, trying everything and then giving up. It makes me wonder if they never discovered that OpenGL works great or if this is just me.