[Solved] VFIO Linux host Win10 guest single passthrough RX 6800 black screen

Hello everyone!

I am trying to get my VFIO single gpu setup up and running, but I am getting a black screen when booting the VM up.

uname: Linux 5.15.78-1-MANJARO

Motherboard UEFI options are:

CSM = Disabled
Above 4G = Disabled
BAR = Disabled

CPU: Ryzen 5800x3D
GPU: Sapphire RX 6800 nitro+

I am using virtmanager (4.1.0) libvirt (8.9) and qemu (7.1)
I am not using any hooks at the moment, I am running this manually:

sudo systemctl stop sddm
then switch to tty2 and do sudo virsh start nameofvm

End result is a black screen :\

Host machine:

$ cat /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT=“amd_iommu=on iommu=pt vfio-pci.ids=1002:73bf,1002:ab28 video=efifb:off quiet apparmor=1 security=apparmor resume=UUID=535b3154-67da-4d84-9e46-9fafcecd8c70 udev.log_priority=3 amdgpu.ppfeaturemask=0xffffffff”

$ dmesg | grep -i -e IOMMU
[ 0.498130] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 0.512611] AMD-Vi: AMD IOMMUv2 loaded and initialized

$ sh ./check_iommu_groups.sh
IOMMU Group 18:
08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c3)
IOMMU Group 19:
08:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]

$ lspci -nnk
08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c3)
Subsystem: Sapphire Technology Limited Device [1da2:e439]
Kernel driver in use: amdgpu
Kernel modules: amdgpu
08:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

kvm xml:

win10-gpup ac7ba21e-c77d-4d66-b56a-8e2e895045eb 8392704 8392704 6 hvm /usr/share/edk2-ovmf/x64/OVMF_CODE.fd /var/lib/libvirt/qemu/nvram/win10-gpup_VARS.fd destroy restart destroy /usr/bin/qemu-system-x86_64

Hello everyone!

For future reference, I found a solution.

TLDR: switched to kernel 6.0.11-1

When investigating the dmesg output I verified the following:

[ 0.000000] Linux version 5.15.78-1-MANJARO ([email protected]) (gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39.0) #1 SMP PREEMPT Thu Nov 10 20:50:09 UTC 2022
(…)
[ 243.785783] VFIO - User Level meta-driver version: 0.3
[ 243.797925] vfio_pci: add [1002:73bf[ffffffff:ffffffff]] class 0x000000/00000000
[ 243.797929] vfio_pci: add [1002:ab28[ffffffff:ffffffff]] class 0x000000/00000000
[ 243.810868] amdgpu 0000:08:00.0: amdgpu: amdgpu: finishing device.
[ 243.918912] Console: switching to colour dummy device 80x25
[ 247.966331] amdgpu: cp queue pipe 4 queue 0 preemption failed
[ 247.968278] amdgpu 0000:08:00.0: amdgpu: Fail to disable thermal alert!
[ 247.968665] BUG: kernel NULL pointer dereference, address: 0000000000000134
[ 247.968670] #PF: supervisor read access in kernel mode
[ 247.968672] #PF: error_code(0x0000) - not-present page
[ 247.968675] PGD 0 P4D 0
[ 247.968678] Oops: 0000 [#1] PREEMPT SMP NOPTI
(…)

When I tested with a different kernel - currently on 6.0.11-1 the issue was not present.

Thanks

1 Like

excellent news and great to see that “Hurr durr I’m a ninja sloth aka 6.0+” is working out for you and made things play nice! Its a shame no one was able to help you but sometimes being able to solve it yourself is way more satisfying so go you! :cookie:

1 Like