RX 6800 XT passthrough card boot looping after installing Radeon drivers on guest

Edit: 7/23/22
Sorry not trying to necro/bump this thread but I want to edit this in case someone comes across this issue.

Turns out that it is not an issue with CSM, the real problem is a severe bug in the Gigabyte BIOS update process. Sometimes the previous settings will ‘‘stick’’ after an update to the BIOS. So when I though I was disabling Above 4G Decoding and Re-Sizeable BAR nothing was happening, both settings stayed on despite the settings appearing disabled in BIOS causing the issue where the GPU would hang after the driver initialized. Which is the expected behavior because SAM is not yet supported by Qemu/KVM.

I noticed this ironically after updating the BIOS again SAM was stuck again but this time disabled and while trying to fix it I came across this bug, a solution seems to be to flash the same BIOS again with the same version which allowed me to make actual changes to the settings, Giabyte really need to fix this…

This seems to be a common problem I’ve looked around for solutions here and on other websites however nothing has worked so far.

As a sort of sanity check I was able to passthrough the my 6800 XT to a Linux guest and it had no problem accessing the GPU, running graphical applications.

However when I try to use a Windows 10 guest everything seems to work fine until I install the Radeon drivers.

After installing the drivers the GPU enters some sort of boot loop. The attached monitor turns on for a couple of seconds then shows a no signal warning, shuts off for a few seconds and turns back on repeating this cycle until I reboot the host even after shuting down the guest.
Once the drivers are installed booting into the guest the GPU goes into the same boot loop after Windows finishes loading them.
Booting into the guest with a Spice display I can see the card has error code 43.

I’ve tried:

  • Hiding the KVM and spoofing the ID.
  • Using several different VBIOS roms (GPU-Z, Linux dump tool, TechPowerUp).
  • A bunch of different kernel and module parameters.
  • Older Windows 10 versions and older Radeon Drivers.
  • Disabling Above 4G Decoding and SAM.
  • Changing ACS Enable, PCIe ARI Support and PCIe ARI Enumeration from a similar post here.
  • Virtualizing the Upsteam and Downstream ports(although I’m pretty sure I didn’t do this right).

dmesg | grep -i vfio

[    3.283033] VFIO - User Level meta-driver version: 0.3
[  146.608161]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  146.744670]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  146.745034]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  146.745377]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  146.745698]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  146.746244]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  146.746694]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  148.513304]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  148.513759]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  148.514189]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  148.514528]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  148.514931] vfio-pci 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=none,decodes=io+mem:owns=none
[  149.142830]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  149.142921]  crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  151.133568] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[  151.133575] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[  151.133579] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x26@0x410
[  151.133580] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x27@0x440
[  198.392112]  sg crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  198.392247]  sg crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  198.458337]  sg crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  198.458461]  sg crypto_user drm agpgart fuse ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid crc32c_intel xhci_pci vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
[  232.815375] vfio-pci 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none

Windows guest log.

2021-12-03 12:41:06.123+0000: starting up libvirt version: 7.9.0, qemu version: 6.1.0, kernel: 5.13.19-2-MANJARO, hostname: strix
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/var/lib/snapd/snap/bin \
HOME=/var/lib/libvirt/qemu/domain-1-win10 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-win10/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-win10/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-win10/.config \
/usr/bin/qemu-system-x86_64 \
-name guest=win10,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-win10/master-key.aes"}' \
-blockdev '{"driver":"file","filename":"/usr/share/edk2-ovmf/x64/OVMF_CODE.fd","node-name":"libvirt-pflash0-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash0-format","read-only":true,"driver":"raw","file":"libvirt-pflash0-storage"}' \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/qemu/nvram/win10_VARS.fd","node-name":"libvirt-pflash1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-pflash1-format","read-only":false,"driver":"raw","file":"libvirt-pflash1-storage"}' \
-machine pc-q35-6.1,accel=kvm,usb=off,vmport=off,dump-guest-core=off,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format,memory-backend=pc.ram \
-cpu host,migratable=on,topoext=on,hv-time=on,hv-relaxed=on,hv-vapic=on,hv-spinlocks=0x1fff,hv-vendor-id=0x136238,kvm=off \
-m 12228 \
-object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":12821987328}' \
-overcommit mem-lock=off \
-smp 12,sockets=1,dies=1,cores=6,threads=2 \
-uuid 560efc5a-19fb-4c7a-935e-d8b36f818518 \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=32,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=localtime,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global ICH9-LPC.disable_s3=1 \
-global ICH9-LPC.disable_s4=1 \
-boot menu=on,strict=on \
-device pcie-root-port,port=16,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=17,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=8,chassis=3,id=pci.3,bus=pcie.0,multifunction=on,addr=0x3 \
-device ioh3420,port=9,chassis=4,id=pci.4,bus=pcie.0,multifunction=on,addr=0x4 \
-device x3130-upstream,id=pci.5,bus=pci.4,addr=0x0 \
-device xio3130-downstream,port=17,chassis=6,id=pci.6,bus=pci.5,addr=0x0 \
-device pcie-root-port,port=22,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \
-device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0,addr=0x2.0x7 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 \
-device virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 \
-blockdev '{"driver":"file","filename":"/var/lib/libvirt/images/win10.qcow2","node-name":"libvirt-1-storage","auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device ide-hd,bus=ide.0,drive=libvirt-1-format,id=sata0-0-0,bootindex=2 \
-netdev tap,fd=34,id=hostnet0 \
-device e1000e,netdev=hostnet0,id=net0,mac=52:54:00:8a:26:45,bus=pci.1,addr=0x0 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-chardev spicevmc,id=charchannel0,name=vdagent \
-device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=com.redhat.spice.0 \
-audiodev id=audio1,driver=spice \
-device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b \
-device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0,audiodev=audio1 \
-chardev spicevmc,id=charredir0,name=usbredir \
-device usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=1 \
-chardev spicevmc,id=charredir1,name=usbredir \
-device usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=2 \
-device vfio-pci,host=0000:0b:00.0,id=hostdev0,bus=pci.6,multifunction=on,addr=0x0,romfile=/etc/firmware/AMD.RX6800XT.16384.201029.rom \
-device vfio-pci,host=0000:0b:00.1,id=hostdev1,bus=pci.6,addr=0x0.0x1 \
-device vfio-pci,host=0000:0b:00.2,id=hostdev2,bus=pci.6,addr=0x0.0x2 \
-device vfio-pci,host=0000:0b:00.3,id=hostdev3,bus=pci.6,addr=0x0.0x3 \
-device usb-host,hostdevice=/dev/bus/usb/001/005,id=hostdev4,bus=usb.0,port=3 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/3 (label charserial0)
audio: Could not init `spice' audio driver
audio: warning: Using timer based audio emulation
2021-12-03T12:43:02.526639Z qemu-system-x86_64: terminating on signal 15 from pid 1184 (/usr/bin/libvirtd)
2021-12-03 12:43:04.449+0000: shutting down, reason=shutdown

My somewhat mangled Windows guest configuration.
Has tweaks trying to virtualize the Upstream and Downstream ports from another post here suggesting it as a way around SR-IOV,
I don’t think I can control if SR-IOV on my motherboard, its possible its not even on but I read on Gigabyte boards it turns on automatically when SVM is turned on or something like that.
Could be why its failing but this is a little above my skill level would appricate help setting this up correctly but it is just a stab in the dark at this point.
Using the default PCI controller layout doesn’t work either.

<domain type="kvm">
  <name>win10</name>
  <uuid>560efc5a-19fb-4c7a-935e-d8b36f818518</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">12521472</memory>
  <currentMemory unit="KiB">12521472</currentMemory>
  <vcpu placement="static">12</vcpu>
  <os>
    <type arch="x86_64" machine="pc-q35-6.1">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/edk2-ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
    <bootmenu enable="yes"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vendor_id state="on" value="0x136238"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="on">
    <topology sockets="1" dies="1" cores="6" threads="2"/>
    <feature policy="require" name="topoext"/>
  </cpu>
  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/var/lib/libvirt/images/win10.qcow2"/>
      <target dev="sda" bus="sata"/>
      <boot order="2"/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x8"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="ioh3420"/>
      <target chassis="4" port="0x9"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x04" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="5" model="pcie-switch-upstream-port">
      <model name="x3130-upstream"/>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="6" model="pcie-switch-downstream-port">
      <model name="xio3130-downstream"/>
      <target chassis="6" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="7" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="7" port="0x16"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/>
    </controller>
    <controller type="pci" index="8" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="8" port="0x17"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/>
    </controller>
    <interface type="network">
      <mac address="52:54:00:8a:26:45"/>
      <source network="default"/>
      <model type="e1000e"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </interface>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <sound model="ich9">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1b" function="0x0"/>
    </sound>
    <audio id="1" type="spice"/>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x0b" slot="0x00" function="0x0"/>
      </source>
      <rom file="/etc/firmware/AMD.RX6800XT.16384.201029.rom"/>
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0" multifunction="on"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x0b" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x1"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x0b" slot="0x00" function="0x2"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x2"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x0b" slot="0x00" function="0x3"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x3"/>
    </hostdev>
    <hostdev mode="subsystem" type="usb" managed="yes">
      <source>
        <vendor id="0x1b1c"/>
        <product id="0x1b48"/>
      </source>
      <address type="usb" bus="0" port="3"/>
    </hostdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="1"/>
    </redirdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="2"/>
    </redirdev>
    <memballoon model="none"/>
  </devices>
</domain>

IOMMU Groups

IOMMU Group 23 09:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c1)
IOMMU Group 24 0a:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU Group 25 0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c1)
IOMMU Group 26 0b:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT] [1002:ab28]
IOMMU Group 27 0b:00.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:73a6]
IOMMU Group 28 0b:00.3 Serial bus controller [0c80]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 USB [1002:73a4]

Modules: vfio vfio_iommu_type1 vfio_pci vfio_virqfd

Kernel paramters I’ve tried: amd_iommu=force_isolation iommu=pt iommu=on amd_iommu=on kvm.ignore_msrs=1 video=efifb:off vfio-pci.ids=1002:73bf,1002:ab28,1002:73a6,1002:73a4 kvm_amd.npt=1 kvm_amd.avic=1

Sorry for the massive wall of text.

install @gnif 's Vendor-Reset. see if that helps.

You shouldn’t need the vendor reset for a 6800xt’s.

I use the following kernel params for mine: "iommu=pt amd_iommu=on vfio-pci.ids=youridshere video=efifb:off"

And I have:
MODULES=(vfio_pci vfio vfio_iommu_type1 vfio_virqfd)

for early loading of the vfio modules in the initrd.

I’d double check your BIOS settings, I ended up having to reset mine to the defaults and reapply them get rid of the error 43 in windows.

1 Like

I reset my BIOS and it did work.

Turns out that I need to have CSM enabled.

Its something I did see as a solution others have found to work but I thought it was due to Resizeable BAR which by default is set to auto and will turn on if CSM is disabled. So instead of enabling CSM I disabled Resizable BAR and Above 4G Encoding which did not work. Should have tried enabling CSM aswell.

Vendor-reset doesn’t apply at all to the 6800 series, it wont even be used.

3 Likes