[Solved] GPU Passthrough suddently stopped working on Windows 10 VM :(

Hello everyone!

This is my first time using GPU passthrough, I had a bit of trouble setting it up but got it working perfectly just a few days ago, or so I thought.

  • Gigabyte B450 AORUS M
  • AMD Ryzen 5 2600 Six-Core Processor
  • ADATA SX8100NP 512GB M.2 2280 NVMe
  • Host: Linux 5.11.14-1-MANJARO Ornara 21.0.2
    • GALAX GeForce GTX 1050 Ti EXOC 4GB GDDR5 128Bit
  • Guest: Windows 10 Pro x64 20H2 April
    • ASUS Geforce RTX 3060 Ti OC TUF Gaming 8GB GDDR6 256bit

I’m using the 3060 for the win10 VM, everything was working fine through multiple startups, driver update, gaming. After an otherwise normal shutdown (not sure if related but noticed a windows update notification, which I later uninstalled attempting to fix the problem, no luck), I haven’t been able to get it working again. The GPU is detected normally in virtmanager but at the VM startup it fails to start, no output or fan spin. If I use Spice Graphics instead I can see win10 boots up just fine, no GPU detected though. Any thoughts?


Diagnostics

$ lspci -nnk

00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Root Complex [1022:1450]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) I/O Memory Management Unit [1022:1451]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
	Kernel driver in use: pcieport
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
	Kernel driver in use: pcieport
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
	Kernel driver in use: pcieport
00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
	Kernel driver in use: pcieport
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
	Kernel driver in use: pcieport
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
	Kernel driver in use: pcieport
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
	Subsystem: Gigabyte Technology Co., Ltd Device [1458:5001]
	Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
	Subsystem: Gigabyte Technology Co., Ltd Device [1458:5001]
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
	Kernel driver in use: k10temp
	Kernel modules: k10temp
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
01:00.0 Non-Volatile memory controller [0108]: Realtek Semiconductor Co., Ltd. RTS5763DL NVMe SSD Controller [10ec:5762] (rev 01)
	Subsystem: Realtek Semiconductor Co., Ltd. RTS5763DL NVMe SSD Controller [10ec:5762]
	Kernel driver in use: nvme
02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset USB 3.1 XHCI Controller [1022:43d5] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:1142]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
	Subsystem: ASMedia Technology Inc. Device [1b21:1062]
	Kernel driver in use: ahci
02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
	Kernel driver in use: pcieport
03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
	Kernel driver in use: pcieport
03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
	Kernel driver in use: pcieport
03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
	Kernel driver in use: pcieport
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 16)
	DeviceName: Broadcom 5762
	Subsystem: Gigabyte Technology Co., Ltd Onboard Ethernet [1458:e000]
	Kernel driver in use: r8169
	Kernel modules: r8169
06:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
	Subsystem: NVIDIA Corporation Device [10de:11bf]
	Kernel driver in use: nvidia
	Kernel modules: nouveau, nvidia_drm, nvidia
06:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
	Subsystem: NVIDIA Corporation Device [10de:11bf]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
07:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3060 Ti] [10de:2486] (rev a1)
	Subsystem: ASUSTeK Computer Inc. Device [1043:87c6]
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau, nvidia_drm, nvidia
07:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:228b] (rev a1)
	Subsystem: ASUSTeK Computer Inc. Device [1043:87c6]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
09:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
09:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
	Kernel driver in use: ccp
	Kernel modules: ccp
09:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Zeppelin USB 3.0 Host controller [1022:145f]
	Subsystem: Gigabyte Technology Co., Ltd Device [1458:5007]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
0a:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
0a:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
	Subsystem: Gigabyte Technology Co., Ltd Device [1458:b002]
	Kernel driver in use: ahci
0a:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]
	Subsystem: Gigabyte Technology Co., Ltd Device [1458:a182]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

VM xml

<domain type='kvm'>
  <name>win10</name>
  <uuid>51062533-bfc9-4dc6-8d97-ef9ae3261783</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>12582912</memory>
  <currentMemory unit='KiB'>12582912</currentMemory>
  <vcpu placement='static'>8</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-5.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='randomid'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
  </features>
  <cpu mode='host-passthrough' check='partial' migratable='on'>
    <topology sockets='1' dies='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/win10.qcow2'/>
      <target dev='sda' bus='sata'/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0xb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0xc'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0xd'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0xe'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x6'/>
    </controller>
    <controller type='pci' index='8' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:02:70:30'/>
      <source network='default'/>
      <model type='e1000e'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <hostdev mode='subsystem' type='usb' managed='yes'>
      <source>
        <vendor id='0x0d8c'/>
        <product id='0x0171'/>
      </source>
      <address type='usb' bus='0' port='3'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x07' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
</domain>

QEMU Log

2021-04-27 03:41:21.985+0000: starting up libvirt version: 7.1.0, qemu version: 5.2.0, kernel: 5.11.14-1-MANJARO, hostname: thejournalist-b450aorusm
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/var/lib/snapd/snap/bin \
HOME=/var/lib/libvirt/qemu/domain-1-win10 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-win10/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-win10/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-win10/.config \
QEMU_AUDIO_DRV=none \
/usr/bin/qemu-system-x86_64 \
-name guest=win10,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes \
-blockdev '{"driver":"file","filename":"/usr/share/edk2-ovmf/x64/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win10_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-5.2,accel=kvm,usb=off,vmport=off,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,memory-backend=pc.ram \
-cpu host,migratable=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vendor-id=randomid,kvm=off \
-m 12288 \
-object memory-backend-ram,id=pc.ram,size=12884901888 \
-overcommit mem-lock=off \
-smp 8,sockets=1,dies=1,cores=4,threads=2 \
-uuid 51062533-bfc9-4dc6-8d97-ef9ae3261783 \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=32,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=localtime,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot menu=on,strict=on \
-device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \
-device pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \
-device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \
-device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \
-device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 \
-device pcie-root-port,port=0xd,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 \
-device pcie-root-port,port=0xe,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 \
-device pcie-pci-bridge,id=pci.8,bus=pci.6,addr=0x0 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/win10.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device ide-hd,bus=ide.0,drive=libvirt-1-format,id=sata0-0-0,bootindex=2 \
-netdev tap,fd=34,id=hostnet0 \
-device e1000e,netdev=hostnet0,id=net0,mac=52:54:00:02:70:30,bus=pci.1,addr=0x0 \
-device usb-host,hostdevice=/dev/bus/usb/003/002,id=hostdev0,bus=usb.0,port=3 \
-device vfio-pci,host=0000:07:00.0,id=hostdev1,bus=pci.3,addr=0x0 \
-device vfio-pci,host=0000:07:00.1,id=hostdev2,bus=pci.4,addr=0x0 \
-device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2021-04-27T03:41:22.170148Z qemu-system-x86_64: warning: This family of AMD CPU doesn't support hyperthreading(2)
Please configure -smp options properly or try enabling topoext feature.
2021-04-27T03:41:25.453517Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
2021-04-27T03:41:25.466836Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
2021-04-27T03:42:37.032037Z qemu-system-x86_64: vfio-pci: Cannot read device rom at 0000:07:00.0
Device option ROM contents are probably invalid (check dmesg).
Skip option ROM probe with rombar=0, or load from file with romfile=
2021-04-27T03:44:20.242524Z qemu-system-x86_64: terminating on signal 15 from pid 11707 (/usr/bin/libvirtd)
2021-04-27 03:45:06.350+0000: shutting down, reason=shutdown
2021-04-27 03:45:36.536+0000: shutting down, reason=failed

$ dmesg | grep -i vfio

(after failed startup)

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11-x86_64 root=UUID=c35ba447-0aaf-4d75-8bd7-dfc406374195 rw quiet amd_iommu=on rd.driver.pre=vfio-pci kvm.ignore_msrs=1 cryptdevice=UUID=68f7c2b2-f971-406e-a046-1efe50988589:luks-68f7c2b2-f971-406e-a046-1efe50988589 root=/dev/mapper/luks-68f7c2b2-f971-406e-a046-1efe50988589 apparmor=1 security=apparmor resume=/dev/mapper/luks-9d65bdc0-7c91-42eb-a553-44531e694e3a udev.log_priority=3
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.11-x86_64 root=UUID=c35ba447-0aaf-4d75-8bd7-dfc406374195 rw quiet amd_iommu=on rd.driver.pre=vfio-pci kvm.ignore_msrs=1 cryptdevice=UUID=68f7c2b2-f971-406e-a046-1efe50988589:luks-68f7c2b2-f971-406e-a046-1efe50988589 root=/dev/mapper/luks-68f7c2b2-f971-406e-a046-1efe50988589 apparmor=1 security=apparmor resume=/dev/mapper/luks-9d65bdc0-7c91-42eb-a553-44531e694e3a udev.log_priority=3
[    1.896779] VFIO - User Level meta-driver version: 0.3
[    1.903808] vfio_pci: unknown parameter 'allow_unsafe_interrupts' ignored
[    1.903901] vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[   62.983351] vfio-pci 0000:07:00.0: enabling device (0000 -> 0003)
[   63.093513] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[   63.093536] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[   63.093544] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
[   63.093546] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
[   63.093548] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
[   63.116674] vfio-pci 0000:07:00.1: enabling device (0000 -> 0002)
[   63.116812] vfio-pci 0000:07:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
[   64.326737] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[   64.353440] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[   65.096858] vfio-pci 0000:07:00.0: timed out waiting for pending transaction; performing function level reset anyway
[   66.323362] vfio-pci 0000:07:00.0: not ready 1023ms after FLR; waiting
[   67.363373] vfio-pci 0000:07:00.0: not ready 2047ms after FLR; waiting
[   69.576775] vfio-pci 0000:07:00.0: not ready 4095ms after FLR; waiting
[   73.846680] vfio-pci 0000:07:00.0: not ready 8191ms after FLR; waiting
[   82.164097] vfio-pci 0000:07:00.0: not ready 16383ms after FLR; waiting
[   98.591597] vfio-pci 0000:07:00.0: not ready 32767ms after FLR; waiting
[  132.725255] vfio-pci 0000:07:00.0: not ready 65535ms after FLR; giving up
[  135.296238] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.302310] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  135.326585] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.329261] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  135.345676] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.347582] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  135.372502] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.383839] vfio-pci 0000:07:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[  135.386194] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.394336] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.394632] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.394653] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.394819] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.395741] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  135.395921] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  135.395942] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  135.396094] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  135.430585] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.431024] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  135.445289] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.445598] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.446520] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  135.446715] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  135.458912] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  135.459759] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  447.922500] vfio-pci 0000:07:00.1: can't change power state from D0 to D3hot (config space inaccessible)
[  448.664261] vfio-pci 0000:07:00.0: timed out waiting for pending transaction; performing function level reset anyway
[  449.890875] vfio-pci 0000:07:00.0: not ready 1023ms after FLR; waiting
[  450.930892] vfio-pci 0000:07:00.0: not ready 2047ms after FLR; waiting
[  453.144154] vfio-pci 0000:07:00.0: not ready 4095ms after FLR; waiting
[  457.410766] vfio-pci 0000:07:00.0: not ready 8191ms after FLR; waiting
[  465.730780] vfio-pci 0000:07:00.0: not ready 16383ms after FLR; waiting
[  482.583753] vfio-pci 0000:07:00.0: not ready 32767ms after FLR; waiting
[  516.716938] vfio-pci 0000:07:00.0: not ready 65535ms after FLR; giving up
[  517.812063] vfio-pci 0000:07:00.1: can't change power state from D0 to D3hot (config space inaccessible)
[  517.812073] vfio-pci 0000:07:00.0: can't change power state from D0 to D3hot (config space inaccessible)
[  661.103793] vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=io+mem:owns=none

$ virsh start win10

(attempt to restart VM)

error: Failed to start domain 'win10'
error: internal error: Unknown PCI header type '127' for device '0000:07:00.0'
2 Likes

Hello and welcome back!
It’s been a while for me too so please bear with me :slight_smile:
After some duckduckducking with the search terms

error: internal error: Unknown PCI header type ‘127’ for device

I found this link unraid-pci-header-type-127
User testdasi says:

The error 127 is your GPU isn’t resetting itself properly. Motherboard BIOS could be a problem but I would say the most likely culprit is your vbios file.

Your QEMU log and dmesg indicate that the gpu isn’t resetting itself properly. see the following lines
QEMU LOG

vfio: Unable to power on device, stuck in D3
vfio-pci: Cannot read device rom at 0000:07:00.0

dmesg

Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
can’t change power state from D0 to D3hot (config space inaccessible)

I’d take a wild guess and assume you haven’t updated your BIOS nor touched the vbios before the error occured.

So I’ll just ask the usuals.
Have you rebooted between attempts? (the host system)
Does your 1060 have a hdmi/dvi/dp -cable connected to a monitor?

I can offer only one thing for you to try at this time:
Load a (newer) vbios for your 1060 as described in archwiki - PCI_passthrough_via_OVMF
see

Updated VBIOS can be used in the VM without flashing.
And in libvirt:

Sidenote :clap: for providing great information!

3 Likes

Thank you for your time! Really appreciate it.

Indeed I haven’t touched the bios or vbios.

I have rebooted, a lot. As you mentioned error 127 seems to be related to the GPU not resetting itself properly. This error appears after every attempt to start the VM, the gpu seems to get stuck and I need to either restart the host or remove the pci device (3060 ti), rescan pci and modprobe nvidia.

Yes, I verified the cable (dp), and the gpu for that matter, by reassigning the Initial PCI slot in the bios to be the 3060 ti, it works happily in manjaro.


I followed your suggestion, but before assigning a new vBIOS I just checked the current one on the gpu using nvflash to dump the rom. Here is the result:

nvflash --version my3060ti.rom
NVIDIA Firmware Update Utility (Version 5.692.0)
Copyright (C) 1993-2021, NVIDIA Corporation. All rights reserved.

Sign-On Message       : RTX3060TI VB Ver 94.04.27.80.AS09
Build GUID            : C84724A58246493588FECB6F459CFAD1
IFR Subsystem ID      : 1043-87C6
Subsystem Vendor ID   : 0x1043
Subsystem ID          : 0x87C6
Version               : 94.04.27.80.59
Image Hash            : N/A
Product Name          : GPU Board
Device Name(s)        : GeForce RTX 3060 Ti
Board ID              : 0x0232
Vendor ID             : 0x10DE
Device ID             : 0x2486
Hierarchy ID          : Normal Board
Chip SKU              : 200-0
Project               : G190-0010
Build Date            : 08/31/20
Modification Date     : 10/22/20
UEFI Version          : 0x60009
UEFI Variant ID       : 0x000000000000000A ( Unknown )
UEFI Signer(s)        : Microsoft Corporation UEFI CA 2011
XUSB-FW Version ID    : N/A
XUSB-FW Build Time    : N/A
InfoROM Version       : G001.0000.03.03
InfoROM Backup        : Present
License Placeholder   : Present
GPU Mode              : N/A
CEC OTA-signed Blob   : Not Present

So VB Ver 94.04.27.80.AS09 is my current vBIOS. Just to be thorough I used the rom-parser from the article you linked to check it.

$./rom-parser my3060ti.rom
Valid ROM signature found @9200h, PCIR offset 170h
	PCIR: type 0 (x86 PC-AT), vendor: 10de, device: 2486, class: 030000
	PCIR: revision 0, vendor revision: 1
Valid ROM signature found @19000h, PCIR offset 1ch
	PCIR: type 3 (EFI), vendor: 10de, device: 2486, class: 000000
	PCIR: revision 3, vendor revision: 0
		EFI: Signature Valid, Subsystem: Boot, Machine: X64
	Last image

Seems fine, I checked and it has the same output as the vendor bios file as well. Anyway…

I checked the vendor website and they have newer versions including even the new-fangled Resizable BAR feature (but I’m not on AMD Zen 3 so I can’t use it). I went with 94.04.38.00.AS18 instead.

Lets specify the vBIOS:
Just so you know, in the VM xml I tried:

<rom file='/usr/share/kvm/94.04.38.00.AS18.rom'/>

and in another attempt:

<rom bar='on' file='/usr/share/kvm/94.04.38.00.AS18.rom'/>

they seem to have the same effect?

new QEMU LOG

[...]
-device usb-host,hostdevice=/dev/bus/usb/003/002,id=hostdev0,bus=usb.0,port=3 \
-device vfio-pci,host=0000:07:00.0,id=hostdev1,bus=pci.3,addr=0x0,rombar=1,romfile=/home/thejournalist/Desktop/94.04.38.00.AS18.rom \
-device vfio-pci,host=0000:07:00.1,id=hostdev2,bus=pci.4,addr=0x0 \
-device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
2021-04-27T16:41:20.585009Z qemu-system-x86_64: warning: This family of AMD CPU doesn't support hyperthreading(2)
Please configure -smp options properly or try enabling topoext feature.
2021-04-27T16:41:23.874749Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
2021-04-27T16:41:23.887683Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3

No more qemu-system-x86_64: vfio-pci: Cannot read device rom at 0000:07:00.0 Device option ROM contents are probably invalid (check dmesg). Skip option ROM probe with rombar=0, or load from file with romfile= !!!

new dmesg

$dmesg | grep -i vfio
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11-x86_64 root=UUID=c35ba447-0aaf-4d75-8bd7-dfc406374195 rw quiet amd_iommu=on rd.driver.pre=vfio-pci kvm.ignore_msrs=1 cryptdevice=UUID=68f7c2b2-f971-406e-a046-1efe50988589:luks-68f7c2b2-f971-406e-a046-1efe50988589 root=/dev/mapper/luks-68f7c2b2-f971-406e-a046-1efe50988589 apparmor=1 security=apparmor resume=/dev/mapper/luks-9d65bdc0-7c91-42eb-a553-44531e694e3a udev.log_priority=3
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.11-x86_64 root=UUID=c35ba447-0aaf-4d75-8bd7-dfc406374195 rw quiet amd_iommu=on rd.driver.pre=vfio-pci kvm.ignore_msrs=1 cryptdevice=UUID=68f7c2b2-f971-406e-a046-1efe50988589:luks-68f7c2b2-f971-406e-a046-1efe50988589 root=/dev/mapper/luks-68f7c2b2-f971-406e-a046-1efe50988589 apparmor=1 security=apparmor resume=/dev/mapper/luks-9d65bdc0-7c91-42eb-a553-44531e694e3a udev.log_priority=3
[    1.724026] VFIO - User Level meta-driver version: 0.3
[    1.731312] vfio_pci: unknown parameter 'allow_unsafe_interrupts' ignored
[    1.731394] vfio-pci 0000:07:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[  234.449235] vfio-pci 0000:07:00.0: enabling device (0000 -> 0003)
[  234.556106] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x1e@0x258
[  234.556135] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[  234.556145] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x26@0xc1c
[  234.556150] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x27@0xd00
[  234.556154] vfio-pci 0000:07:00.0: vfio_ecap_init: hiding ecap 0x25@0xe00
[  234.585922] vfio-pci 0000:07:00.1: enabling device (0000 -> 0002)
[  234.586170] vfio-pci 0000:07:00.1: vfio_ecap_init: hiding ecap 0x25@0x160
[  235.895977] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  235.922651] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  236.665920] vfio-pci 0000:07:00.0: timed out waiting for pending transaction; performing function level reset anyway
[  237.892580] vfio-pci 0000:07:00.0: not ready 1023ms after FLR; waiting
[  238.932557] vfio-pci 0000:07:00.0: not ready 2047ms after FLR; waiting
[  241.092921] vfio-pci 0000:07:00.0: not ready 4095ms after FLR; waiting
[  245.359304] vfio-pci 0000:07:00.0: not ready 8191ms after FLR; waiting
[  253.679129] vfio-pci 0000:07:00.0: not ready 16383ms after FLR; waiting
[  270.958975] vfio-pci 0000:07:00.0: not ready 32767ms after FLR; waiting
[  305.092376] vfio-pci 0000:07:00.0: not ready 65535ms after FLR; giving up
[  307.615615] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.618085] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  307.638125] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.640985] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  307.658686] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.660825] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  307.679748] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.704195] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.733952] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.734131] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.734146] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.734253] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.734808] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  307.734924] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  307.734939] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  307.735048] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  307.760685] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.761122] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  307.770453] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.770613] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.771189] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  307.771354] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  307.782105] vfio-pci 0000:07:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  307.782746] vfio-pci 0000:07:00.1: vfio_bar_restore: reset recovery - restoring BARs

No more vfio-pci 0000:07:00.1: can't change power state from D0 to D3hot (config space inaccessible) !!!

No more errors, but still no output :confused:
I’ve tried different vBIOSes, the same version but from the vendors website, my own externally dumped version, nothing.


Oh no… As I was gathering this info for you this happened

– system reboot
– specifying newest vBIOS, VM starts with no video output through the gpu, as usual, shutdown through virtmanager
– Oh, I forgot to look at dmesg, lets try to reinit the gpu and do this again, using something along the lines of (I don’t remember exactly the order of operations):

$ systemctl stop libvirtd.service
$ echo 1 > /sys/bus/pci/devices/0000:07:00.0/remove
$ echo 1 > /sys/bus/pci/devices/0000:07:00.1/remove
$ echo 1 > /sys/bus/pci/rescan
$ modprobe -a nvidia_drm
$ modprobe -a nvidia_modeset
$ modprobe -a nvidia
$ systemctl start libvirtd.service

– alright, lets turn on the VM again and get logs
– the vm video output WORKED!? Why? Is it fixed?
– ok, shutdown guest, turn on guest again: NO OUTPUT
– No matter what I try I can’t reproduce it again, it worked exactly once! Is it working intermittently, was this at random? I don’t get it. I would have felt better if it hadn’t worked at all.

I’m seriously considering starting from scratch here. Am I missing something simple?

Again, thanks for your help!

Device Name(s) : GeForce RTX 3060 Ti

Sorry - I wrote 1060… can’t win them all :smiley:

Are you manually binding the vfio_pci/nvidia drivers? Is there a reason for this? I must have missed this the first time I took a look at your kernel parameters.
I saw amd_iommu=on, and rd.driver.pre=vfio-pci but I must have missed the vfio-pci.ids=“your_3060_ti_iommu_ids”

Try adding the id’s (3060 ti and it’s audio device) to your kernel parameters as described here:
archwiki - Binding_vfio-pci_via_device_ID

Reboot host and try to boot the guest.

What I’ve read/understood is that binding drivers on the fly can be finicky.

1 Like

Good news everyone! As Todd Howard would put it, It Just Works!™
Bad news is I changed so many things I’m not exactly sure what fixed it…

Here are the relevant bits if anyone comes across the same situation:

Arch PCI passthrough guide

I followed the PCI passthrough via OVMF - ArchWiki step-by-step carefully again (this time I payed special attention to #Loading vfio-pci early)

Loading vfio-pci early

Since Arch’s linux has vfio-pci built as a module, we need to force it to load early before the graphics drivers have a chance to bind to the card.

at /etc/mkinitcpio.conf

 MODULES="vfio_pci vfio vfio_iommu_type1 vfio_virqfd nouveau nvidia_drm nvidia"
 HOOKS="base vfio udev autodetect modconf block keyboard keymap encrypt openswap resume filesystems fsck"

at /etc/modprobe.d/vfio.conf

options kvm ignore_msrs=1
options vfio-pci ids=10de:2486,10de:228b disable_vga=1 disable_idle_d3=1
softdep nouveau pre: vfio vfio_pci
softdep nvidia_drm pre: vfio vfio_pci
softdep nvidia pre: vfio vfio_pci

* Remember to regenerate the initramfs with $ mkinitcpio -P linux

kernel parameters

As Pollomoolokki pointed out, there were some inconsistencies with my parameters. Here is the current working version:

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt rd.driver.pre=vfio-pci kvm.ignore_msrs=1 vfio-pci.ids=10de:2486,10de:228b disable_vga=1 disable_idle_d3=1 [...]
(keep in mind I’m using an AMD cpu and your vfio-pci.ids wiil most likely be different than mine)

VM xml

Here are the important sections, including the nvidia code 43 workaround. Other than that not much happening here, my GPU has an UEFI vBIOS so loading the rom wasn’t needed after all (YMMV, a lot).

[...]

  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='randomid'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
  </features>
  <cpu mode='host-passthrough' check='partial' migratable='on'>
    <topology sockets='1' dies='1' cores='4' threads='2'/>
  </cpu>

[...]

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x07' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
</domain>

[EOF]

Acts of desperation

I moved everything around hardware-wise, reseated gpus, cables, tried different slots and combinations (which in hindsight doesnt make much sense since the PCIe slot 2 is not isolated IOMMU group wise) ultimately everything stayed the same, but who knows, maybe something wasn’t connected properly)


Alright, thank you for your help! My generic HDMI switcher arrived just in time to be put to use. For now I’ll be using it and a sw kvm switch for keyb/mouse but I will definitely keep an eye out for the store.level1techs kvm I hear so much about :smiley:

3 Likes

Been struggling with a similar issue after upgrading to Fedora 36. My gtx1080 would always crash (error: Unknown PCI header type ‘127’) after starting my VM and sometimes even without starting my VM if I just left my host running for long enough. Seems like my system would just stop recognizing the card after a while or if I started my VM. Interestingly this would only happen when binding the 1080 to the vfio-pci drivers.

In the end I managed to fix it with a combination of updating my GPUs vBIOS and going back to an older BIOS version on my motherboard (and resetting motherboards BIOS settings). Can’t say for sure which thing exactly fixed it but after doing these two things it works now.

This issue is definitely very weird considering that it started when switching to Fedora 36, but still persisted when I went back to my Fedora 35 install which is on a different drive.

Thought I would post my resolution to hopefully help someone else in the future.