Hey, everyone.
So, I started experimenting with passing my RX 5700XT through to a VM on Manjaro Linux.
I followed this guide and got it kinda working, but errored out, with a Code 43.
Then, I downloaded GPU-Z to better understand what could be going wrong.
From what I saw, GPU-Z doesn’t detect PCI-e lanes nor does it detect it’s generation, is it possible to fix this, is it possible it’s causing issues? I followed this thread to try and fix that, but it didn’t work.
This is my lspci script
IOMMU Group 0:
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:191f] (rev 07)
IOMMU Group 1:
00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c1)
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c1)
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
IOMMU Group 10:
00:1f.0 ISA bridge [0601]: Intel Corporation Z170 Chipset LPC/eSPI Controller [8086:a145] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31)
IOMMU Group 11:
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-V [8086:15b8] (rev 31)
IOMMU Group 12:
05:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 06)
IOMMU Group 13:
07:00.0 Multimedia controller [0480]: Philips Semiconductors SAA7162 [1131:7162]
IOMMU Group 14:
08:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU Group 15:
09:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU Group 2:
00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
IOMMU Group 3:
00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31)
IOMMU Group 4:
00:1b.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #17 [8086:a167] (rev f1)
IOMMU Group 5:
00:1b.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #19 [8086:a169] (rev f1)
IOMMU Group 6:
00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #1 [8086:a110] (rev f1)
IOMMU Group 7:
00:1c.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #3 [8086:a112] (rev f1)
IOMMU Group 8:
00:1c.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [8086:a114] (rev f1)
IOMMU Group 9:
00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [8086:a118] (rev f1)
I changed the controller that was handling the GPU to look like this:
Then I changed the GPU hostdev section to look like this:
And finally, the audio hostdev sections:
This is the log from my last vm launch
2020-10-18 00:13:25.744+0000: starting up libvirt version: 6.5.0, qemu version: 5.1.0, kernel: 5.8.11-1-MANJARO, hostname: JotaPedroMJ
LC_ALL=C
PATH=/usr/local/sbin:/usr/local/bin:/usr/bin:/var/lib/snapd/snap/bin
HOME=/var/lib/libvirt/qemu/domain-1-win10
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-win10/.local/share
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-win10/.cache
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-win10/.config
QEMU_AUDIO_DRV=none
/usr/bin/qemu-system-x86_64
-name guest=win10,debug-threads=on
-S
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-win10/master-key.aes
-blockdev ‘{“driver”:“file”,“filename”:"/usr/share/edk2-ovmf/x64/OVMF_CODE.secboot.fd",“node-name”:“libvirt-pflash0-storage”,“auto-read-only”:true,“discard”:“unmap”}’
-blockdev ‘{“node-name”:“libvirt-pflash0-format”,“read-only”:true,“driver”:“raw”,“file”:“libvirt-pflash0-storage”}’
-blockdev ‘{“driver”:“file”,“filename”:"/var/lib/libvirt/qemu/nvram/win10_VARS.fd",“node-name”:“libvirt-pflash1-storage”,“auto-read-only”:true,“discard”:“unmap”}’
-blockdev ‘{“node-name”:“libvirt-pflash1-format”,“read-only”:false,“driver”:“raw”,“file”:“libvirt-pflash1-storage”}’
-machine pc-q35-5.1,accel=kvm,usb=off,vmport=off,smm=on,dump-guest-core=off,kernel_irqchip=on,pflash0=libvirt-pflash0-format,pflash1=libvirt-pflash1-format
-cpu host,migratable=on,hypervisor=off,invtsc=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-vpindex,hv-synic,hv-stimer,hv-reset,hv-vendor-id=Sinatra,hv-frequencies,kvm=off,host-cache-info=on,l3-cache=off
-global driver=cfi.pflash01,property=secure,value=on
-m 16384
-overcommit mem-lock=off
-smp 8,sockets=1,dies=1,cores=4,threads=2
-uuid 65042a22-79ce-49ec-98be-e6e69ff5ba4d
-display none
-no-user-config
-nodefaults
-chardev socket,id=charmonitor,fd=30,server,nowait
-mon chardev=charmonitor,id=monitor,mode=control
-rtc base=utc,clock=vm,driftfix=slew
-global kvm-pit.lost_tick_policy=delay
-no-hpet
-no-shutdown
-global ICH9-LPC.disable_s3=1
-global ICH9-LPC.disable_s4=1
-boot menu=off,strict=on
-device pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1
-device ioh3420,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1
-device pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2
-device pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3
-device pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4
-device pcie-pci-bridge,id=pci.6,bus=pci.1,addr=0x0
-device pcie-root-port,port=0xd,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x5
-device pcie-root-port,port=0xe,chassis=8,id=pci.8,bus=pcie.0,addr=0x1.0x6
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.3,addr=0x0
-blockdev ‘{“driver”:“file”,“filename”:"/var/lib/libvirt/images/win10.qcow2",“node-name”:“libvirt-1-storage”,“auto-read-only”:true,“discard”:“unmap”}’
-blockdev ‘{“node-name”:“libvirt-1-format”,“read-only”:false,“driver”:“qcow2”,“file”:“libvirt-1-storage”,“backing”:null}’
-device ide-hd,bus=ide.0,drive=libvirt-1-format,id=sata0-0-0,bootindex=1
-device ich9-intel-hda,id=sound0,bus=pcie.0,addr=0x1b
-device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0
-device vfio-pci,host=0000:00:1f.6,id=hostdev0,bus=pci.6,addr=0x1
-device vfio-pci,host=0000:00:14.0,id=hostdev1,bus=pci.6,addr=0x2
-device vfio-pci,host=0000:07:00.0,id=hostdev2,bus=pci.4,addr=0x0
-device vfio-pci,host=0000:03:00.0,id=hostdev3,bus=pci.2,multifunction=on,addr=0x0
-device vfio-pci,host=0000:03:00.1,id=hostdev4,bus=pci.2,addr=0x0.0x1
-device virtio-balloon-pci,id=balloon0,bus=pci.7,addr=0x0
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on
2020-10-18 00:13:25.744+0000: Domain id=1 is tainted: host-cpu
2020-10-18T00:13:28.884505Z qemu-system-x86_64: vfio: Cannot reset device 0000:03:00.1, no available reset mechanism.
2020-10-18T00:13:28.911117Z qemu-system-x86_64: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
2020-10-18T00:13:30.037786Z qemu-system-x86_64: vfio: Cannot reset device 0000:03:00.1, no available reset mechanism.
2020-10-18T00:13:31.071051Z qemu-system-x86_64: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
2020-10-18T00:18:05.820937Z qemu-system-x86_64: vfio: Cannot reset device 0000:03:00.1, no available reset mechanism.
2020-10-18T00:18:05.837628Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
2020-10-18T00:18:05.864716Z qemu-system-x86_64: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
2020-10-18T00:18:05.877745Z qemu-system-x86_64: vfio: Unable to power on device, stuck in D3
2020-10-18T00:18:06.911070Z qemu-system-x86_64: vfio: Cannot reset device 0000:03:00.1, no available reset mechanism.
2020-10-18T00:18:08.070979Z qemu-system-x86_64: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
2020-10-18T00:23:36.844048Z qemu-system-x86_64: terminating on signal 15 from pid 8437 (/usr/bin/libvirtd)
2020-10-18 00:24:26.877+0000: shutting down, reason=shutdown
Here’s the XML from Virsh
win10 65042a22-79ce-49ec-98be-e6e69ff5ba4d 16777216 16777216 8 hvm /usr/share/edk2-ovmf/x64/OVMF_CODE.secboot.fd /var/lib/libvirt/qemu/nvram/win10_VARS.fd destroy restart destroy /usr/bin/qemu-system-x86_64
My kvm.conf (under /etc/libvirt/hooks/) looks like this
VIRSH_RADEON_VIDEO=pci_0000_03_00_0
VIRSH_RADEON_AUDIO=pci_0000_03_00_1
VIRSH_RADEON_UPSTREAM=pci_0000_01_00_0
VIRSH_RADEON_DOWNSTREAM=pci_0000_02_00_0
VIRSH_INTEL_PCIe_CONTROLLER=pci_0000_00_01_0
VIRSH_INTEL_USB_xHCI_CONTROLLER=pci_0000_00_14_0
VIRSH_PINNACLE=pci_0000_06_00_0
My start script looks like this
#!/bin/bash
Helpful to read output when debugging
set -x
Load the config file with our environmental variables
source “/etc/libvirt/hooks/kvm.conf”
Stop your display manager. If you’re on kde it’ll be sddm.service. Gnome users should use ‘killall gdm-x-session’ instead
systemctl stop lightdm.service
Unbind VTconsoles
echo 0 > /sys/class/vtconsole/vtcon0/bind
Some machines might have more than 1 virtual console. Add a line for each corresponding VTConsole
echo 0 > /sys/class/vtconsole/vtcon1/bind
Unbind EFI-Framebuffer
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind
Avoid a race condition by waiting a couple of seconds. This can be calibrated to be shorter or longer if required for your system
sleep 5
systemctl suspend
Unload all Nvidia drivers
modprobe -r amdgpu
modprobe -r xhci_pci
modprobe -r drm_kms_helper
modprobe -r drmUnbind the GPU from display driver
virsh nodedev-detach $VIRSH_RADEON_VIDEO
virsh nodedev-detach $VIRSH_RADEON_AUDIO
virsh nodedev-detach $VIRSH_RADEON_UPSTREAM
virsh nodedev-detach $VIRSH_RADEON_DOWNSTREAM
virsh nodedev-detach $VIRSH_INTEL_PCIe_CONTROLLER
virsh nodedev-detach $VIRSH_INTEL_USB_xHCI_CONTROLLER
virsh nodedev-detach $VIRSH_PINNACLELoad VFIO kernel module
modprobe vfio
modprobe vfio_pci
modprobe vfio_iommu_type1
My release script looks like this
#!/bin/bash
systemctl suspend
set -xLoad the config file with our environmental variables
source “/etc/libvirt/hooks/kvm.conf”
Unload VFIO-PCI Kernel Driver
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfioRe-Bind GPU to our display drivers
virsh nodedev-reattach $VIRSH_RADEON_VIDEO
virsh nodedev-reattach $VIRSH_RADEON_AUDIO
virsh nodedev-reattach $VIRSH_RADEON_UPSTREAM
virsh nodedev-reattach $VIRSH_RADEON_DOWNSTREAM
virsh nodedev-reattach $VIRSH_INTEL_PCIe_CONTROLLER
virsh nodedev-reattach $VIRSH_INTEL_USB_xHCI_CONTROLLER
virsh nodedev-reattach $VIRSH_PINNACLERebind VT consoles
echo 1 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind
Re-Bind EFI-Framebuffer
echo “efi-framebuffer.0” > /sys/bus/platform/drivers/efi-framebuffer/bind
Load Radeon drivers
modprobe amdgpu
modprobe xhci_pci
modprobe drm_kms_helper
modprobe drmRestart Display Manager
systemctl start lightdm.service
Some notes about non-GPU passthroughs on this card:
-
Theres a PCTV, that strangely doesn’t seem to show up on the guest, but it’s not important, I’ll remove it ASAP
-
I passed the xHCI controller through because I don’t need it on the host while I’m using the guest, it seems to correctly re-attach once I stop the VM.
-
The NIC was passed through due to overhead concerns. I have a TP-Link cheap NIC slotted that the host successfully switches too. (SSH works successfully through the TP-Link and the guest machine can use the Intel NIC with no issues)
There are some stray systemctl suspend because I wanted to try what I read in this post.
I’m sorry for the long mess of dumps, but I want to be as explicit as I can (if there’s something missing, warn me and I’ll gladly add it). How should I proceed, what can I try to do to fix this?