Tried to do single GPU passthrough with a single RX 5700XT, stuck on Code 43 or Radeon Software not recognizing the card

Hey, everyone.

So, I started experimenting with passing my RX 5700XT through to a VM on Manjaro Linux.

I followed this guide and got it kinda working, but errored out, with a Code 43.

Then, I downloaded GPU-Z to better understand what could be going wrong.
gpuz-ss

From what I saw, GPU-Z doesn’t detect PCI-e lanes nor does it detect it’s generation, is it possible to fix this, is it possible it’s causing issues? I followed this thread to try and fix that, but it didn’t work.

This is my lspci script

IOMMU Group 0:
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:191f] (rev 07)
IOMMU Group 1:
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c1)
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
IOMMU Group 10:
00:1f.0 ISA bridge [0601]: Intel Corporation Z170 Chipset LPC/eSPI Controller [8086:a145] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31)
IOMMU Group 11:
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8] (rev 31)
IOMMU Group 12:
05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 06)
IOMMU Group 13:
07:00.0 Multimedia controller [0480]: Philips Semiconductors SAA7162 [1131:7162]
IOMMU Group 14:
08:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU Group 15:
09:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU Group 2:
00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
IOMMU Group 3:
00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31)
IOMMU Group 4:
00:1b.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 [8086:a167] (rev f1)
IOMMU Group 5:
00:1b.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #19 [8086:a169] (rev f1)
IOMMU Group 6:
00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 [8086:a110] (rev f1)
IOMMU Group 7:
00:1c.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #3 [8086:a112] (rev f1)
IOMMU Group 8:
00:1c.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [8086:a114] (rev f1)
IOMMU Group 9:
00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [8086:a118] (rev f1)

I changed the controller that was handling the GPU to look like this:

Then I changed the GPU hostdev section to look like this:

And finally, the audio hostdev sections:

This is the log from my last vm launch

2020-10-18 00:13:25.744+0000: starting up libvirt version: 6.5.0, qemu version: 5.1.0, kernel: 5.8.11-1-MANJARO, hostname: JotaPedroMJ
LC_ALL=C
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/var/lib/snapd/snap/bin
HOME=/var/lib/libvirt/qemu/domain-1-win10
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-win10/.local/share
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-win10/.cache
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-win10/.config
QEMU_AUDIO_DRV=none
/usr/bin/qemu-system-x86_64
-name guest=win10,debug-threads=on
-S
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes
-blockdev ‘{“driver”:“file”,“filename”:"/usr/share/edk2-ovmf/x64/OVMF_CODE.secboot.fd",“node-name”:“libvirt-pflash0-storage”,“auto-read-only”:true,“discard”:“unmap”}’
-blockdev ‘{“node-name”:“libvirt-pflash0-format”,“read-only”:true,“driver”:“raw”,“file”:“libvirt-pflash0-storage”}’
-blockdev ‘{“driver”:“file”,“filename”:"/var/lib/libvirt/qemu/nvram/win10_VARS.fd",“node-name”:“libvirt-pflash1-storage”,“auto-read-only”:true,“discard”:“unmap”}’
-blockdev ‘{“node-name”:“libvirt-pflash1-format”,“read-only”:false,“driver”:“raw”,“file”:“libvirt-pflash1-storage”}’
-machine pc-q35-5.1,accel=kvm,usb=off,vmport=off,smm=on,dump-guest-core=off,kernel_irqchip=on,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format
-cpu host,migratable=on,hypervisor=off,invtsc=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-synic,hv-stimer,hv-reset,hv-vendor-id=Sinatra,hv-frequencies,kvm=off,host-cache-info=on,l3-cache=off
-global driver=cfi.pflash01,property=secure,value=on
-m 16384
-overcommit mem-lock=off
-smp 8,sockets=1,dies=1,cores=4,threads=2
-uuid 65042a22-79ce-49ec-98be-e6e69ff5ba4d
-display none
-no-user-config
-nodefaults
-chardev socket,id=charmonitor,fd=30,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control
-rtc base=utc,clock=vm,driftfix=slew
-global kvm-pit.lost_tick_policy=delay
-no-hpet
-no-shutdown
-global ICH9-LPC.disable_s3=1
-global ICH9-LPC.disable_s4=1
-boot menu=off,strict=on
-device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
-device ioh3420,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1
-device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2
-device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3
-device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4
-device pcie-pci-bridge,id=pci.6,bus=pci.1,addr=0x0
-device pcie-root-port,port=0xd,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x5
-device pcie-root-port,port=0xe,chassis=8,id=pci.8,bus=pcie.0,addr=0x1.0x6
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0
-blockdev ‘{“driver”:“file”,“filename”:"/var/lib/libvirt/images/win10.qcow2",“node-name”:“libvirt-1-storage”,“auto-read-only”:true,“discard”:“unmap”}’
-blockdev ‘{“node-name”:“libvirt-1-format”,“read-only”:false,“driver”:“qcow2”,“file”:“libvirt-1-storage”,“backing”:null}’
-device ide-hd,bus=ide.0,drive=libvirt-1-format,id=sata0-0-0,bootindex=1
-device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b
-device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0
-device vfio-pci,host=0000:00:1f.6,id=hostdev0,bus=pci.6,addr=0x1
-device vfio-pci,host=0000:00:14.0,id=hostdev1,bus=pci.6,addr=0x2
-device vfio-pci,host=0000:07:00.0,id=hostdev2,bus=pci.4,addr=0x0
-device vfio-pci,host=0000:03:00.0,id=hostdev3,bus=pci.2,multifunction=on,addr=0x0
-device vfio-pci,host=0000:03:00.1,id=hostdev4,bus=pci.2,addr=0x0.0x1
-device virtio-balloon-pci,id=balloon0,bus=pci.7,addr=0x0
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on
2020-10-18 00:13:25.744+0000: Domain id=1 is tainted: host-cpu
2020-10-18T00:13:28.884505Z qemu-system-x86_64: vfio: Cannot reset device 0000:03:00.1, no available reset mechanism.
2020-10-18T00:13:28.911117Z qemu-system-x86_64: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
2020-10-18T00:13:30.037786Z qemu-system-x86_64: vfio: Cannot reset device 0000:03:00.1, no available reset mechanism.
2020-10-18T00:13:31.071051Z qemu-system-x86_64: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
2020-10-18T00:18:05.820937Z qemu-system-x86_64: vfio: Cannot reset device 0000:03:00.1, no available reset mechanism.
2020-10-18T00:18:05.837628Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
2020-10-18T00:18:05.864716Z qemu-system-x86_64: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
2020-10-18T00:18:05.877745Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
2020-10-18T00:18:06.911070Z qemu-system-x86_64: vfio: Cannot reset device 0000:03:00.1, no available reset mechanism.
2020-10-18T00:18:08.070979Z qemu-system-x86_64: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
2020-10-18T00:23:36.844048Z qemu-system-x86_64: terminating on signal 15 from pid 8437 (/usr/bin/libvirtd)
2020-10-18 00:24:26.877+0000: shutting down, reason=shutdown

Here’s the XML from Virsh

win10 65042a22-79ce-49ec-98be-e6e69ff5ba4d 16777216 16777216 8 hvm /usr/share/edk2-ovmf/x64/OVMF_CODE.secboot.fd /var/lib/libvirt/qemu/nvram/win10_VARS.fd destroy restart destroy /usr/bin/qemu-system-x86_64

My kvm.conf (under /etc/libvirt/hooks/) looks like this

VIRSH_RADEON_VIDEO=pci_0000_03_00_0
VIRSH_RADEON_AUDIO=pci_0000_03_00_1
VIRSH_RADEON_UPSTREAM=pci_0000_01_00_0
VIRSH_RADEON_DOWNSTREAM=pci_0000_02_00_0
VIRSH_INTEL_PCIe_CONTROLLER=pci_0000_00_01_0
VIRSH_INTEL_USB_xHCI_CONTROLLER=pci_0000_00_14_0
VIRSH_PINNACLE=pci_0000_06_00_0

My start script looks like this

#!/bin/bash

Helpful to read output when debugging

set -x

Load the config file with our environmental variables

source “/etc/libvirt/hooks/kvm.conf”

Stop your display manager. If you’re on kde it’ll be sddm.service. Gnome users should use ‘killall gdm-x-session’ instead

systemctl stop lightdm.service

Unbind VTconsoles

echo 0 > /sys/class/vtconsole/vtcon0/bind

Some machines might have more than 1 virtual console. Add a line for each corresponding VTConsole

echo 0 > /sys/class/vtconsole/vtcon1/bind

Unbind EFI-Framebuffer

echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind

Avoid a race condition by waiting a couple of seconds. This can be calibrated to be shorter or longer if required for your system

sleep 5

systemctl suspend

Unload all Nvidia drivers

modprobe -r amdgpu
modprobe -r xhci_pci
modprobe -r drm_kms_helper
modprobe -r drm

Unbind the GPU from display driver

virsh nodedev-detach $VIRSH_RADEON_VIDEO
virsh nodedev-detach $VIRSH_RADEON_AUDIO
virsh nodedev-detach $VIRSH_RADEON_UPSTREAM
virsh nodedev-detach $VIRSH_RADEON_DOWNSTREAM
virsh nodedev-detach $VIRSH_INTEL_PCIe_CONTROLLER
virsh nodedev-detach $VIRSH_INTEL_USB_xHCI_CONTROLLER
virsh nodedev-detach $VIRSH_PINNACLE

Load VFIO kernel module

modprobe vfio
modprobe vfio_pci
modprobe vfio_iommu_type1

My release script looks like this

#!/bin/bash
systemctl suspend
set -x

Load the config file with our environmental variables

source “/etc/libvirt/hooks/kvm.conf”

Unload VFIO-PCI Kernel Driver

modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

Re-Bind GPU to our display drivers

virsh nodedev-reattach $VIRSH_RADEON_VIDEO
virsh nodedev-reattach $VIRSH_RADEON_AUDIO
virsh nodedev-reattach $VIRSH_RADEON_UPSTREAM
virsh nodedev-reattach $VIRSH_RADEON_DOWNSTREAM
virsh nodedev-reattach $VIRSH_INTEL_PCIe_CONTROLLER
virsh nodedev-reattach $VIRSH_INTEL_USB_xHCI_CONTROLLER
virsh nodedev-reattach $VIRSH_PINNACLE

Rebind VT consoles

echo 1 > /sys/class/vtconsole/vtcon0/bind

echo 0 > /sys/class/vtconsole/vtcon1/bind

Re-Bind EFI-Framebuffer

echo “efi-framebuffer.0” > /sys/bus/platform/drivers/efi-framebuffer/bind

Load Radeon drivers

modprobe amdgpu
modprobe xhci_pci
modprobe drm_kms_helper
modprobe drm

Restart Display Manager

systemctl start lightdm.service

Some notes about non-GPU passthroughs on this card:

  • Theres a PCTV, that strangely doesn’t seem to show up on the guest, but it’s not important, I’ll remove it ASAP

  • I passed the xHCI controller through because I don’t need it on the host while I’m using the guest, it seems to correctly re-attach once I stop the VM.

  • The NIC was passed through due to overhead concerns. I have a TP-Link cheap NIC slotted that the host successfully switches too. (SSH works successfully through the TP-Link and the guest machine can use the Intel NIC with no issues)

There are some stray systemctl suspend because I wanted to try what I read in this post.

I’m sorry for the long mess of dumps, but I want to be as explicit as I can (if there’s something missing, warn me and I’ll gladly add it). How should I proceed, what can I try to do to fix this?

In my case to fix the code 43 error I installed the reset bug kernel patch.

Once that was done, after a reboot I ended up on a frozen boot screen, which was fixed by this.

1 Like

Sorry, but how would I apply the patch? I’m kind of a noob with the linux kernel

It’s not particularly straightforward.
In my case to get the v2 patch, I went the lazy route and git cloned this AUR. Then I changed the patch in the file to the one here, ran makepkg -si and sudo mkinitpcio -P. I had to disable signature checking to get makepkg to work though.

It might be easiest to just attempt to install that AUR without making any changes to it.

1 Like

Thanks! It worked, now I’ll just have to add yet another patch (the ACS one) because my sound card is integrated with the same group as the ACPI, and is showing up as a system device (and throwing an error 10)

Does setting a vendor ID and hiding the VM not working like it does for the Nvidia cards?
I never did a KVM with a NAVI but i was thinking of a W5700… because they are designed for passthrough. One of the pluses about workstation equipment, the virtualization features are alot better. My old T7610 workstation requires no ACS patches for example. Also with Quadros you can expose the MSRS to the OS unlike Geforce cards…