VFIO Setup, since Kernel update to 6.8 not able to boot VM, bootable when blacklisting amdgpu?

Hello Forum,

i updated my Kernel from 5.15 to 6.8 but now my VM will not boot when it has the PCI Host Device added to it. I use QEMU/VIrtmanager and it worked like a charm all this time but with 6.8 when booting up my Windows 11 Gaming VM, i get a black screen. CPU Performance goes to 7% and then stays at 0%.

I have been troubled by this for a few day, what i had gathered is that according to my lspci -nnk output, vfio-pci is correctly controlling my second gpu but still have issues booting up the vm.

when i however blacklist my amdgpu driver, booting up the VM is perfectly fine but my host PC has no proper output and my System other GPU only shows one PC instead of both. I am guessing after blacklisting the amdgpu the signal from the igpu goes through the videoports.

my grub:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt vfio-pci.ids=1002:744c,1002:ab30 splash"

my modprobe.d/vfio.conf:

pro-gamer@pro-gamer:/home/mokura$ cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:744c,1002:ab30

my lspci -nnk:
for my host gpu:

0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:7480] (rev cf)
	Subsystem: Sapphire Technology Limited Device [1da2:e452]
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu
0b:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:ab30]
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:ab30]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

for my vm:

03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:744c] (rev cc)
	Subsystem: Sapphire Technology Limited Device [1da2:e471]
	Kernel driver in use: vfio-pci
	Kernel modules: amdgpu
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:ab30]
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:ab30]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel

my System specs:
CPU: Intel i9-14900k
GPU host: RX 7600
GPU VM: RX 7900 XT

my inxi -Gx:

mokura@pro-gamer:~$ inxi -Gx
Graphics:
  Device-1: Intel vendor: Gigabyte driver: i915 v: kernel bus-ID: 00:02.0
  Device-2: AMD vendor: Sapphire driver: vfio-pci v: N/A bus-ID: 03:00.0
  Device-3: AMD vendor: Sapphire driver: amdgpu v: kernel bus-ID: 0b:00.0
  Display: x11 server: X.Org v: 1.21.1.4 driver: X:
    loaded: amdgpu,ati,modesetting unloaded: fbdev,radeon,vesa gpu: amdgpu
    resolution: 1: 1920x1080 2: 1920x1080~60Hz 3: 2560x1440~60Hz
  OpenGL:
    renderer: AMD Radeon RX 7600 (gfx1102 LLVM 15.0.7 DRM 3.57 6.8.0-39-generic)
    v: 4.6 Mesa 23.2.1-1ubuntu3.1~22.04.2 direct render: Yes

my modules in initframfs:

pro-gamer@pro-gamer:/home/mokura$ cat /etc/initramfs-tools/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

i dont know what other information is needed. The fact of the matter is that my VM, when i blacklist the amdgpu, works fine and dandy but i only have 1 output for the Host instead of my multiple Monitor setup. When i dont blacklist the amdgpu → VM is stuck in blackscreen.

I use QEMU/VIrtmanager.
Virtualization is enabled, etc…

hope maybe someone has an idea what could be the issue why my VM wont work.
another thing, funnily. When i was on 5.15 i had a reset gpu script which i used to combat the vfio reset bug that i am cursed with, ever since upgrading the kernel to 6.8 when running the script the system doesnt “wake up”, script in question:

mokura@pro-gamer:~/Documents/Qemu VM$ cat reset_gpu.sh 
#!/bin/bash

# Remove the GPU devices
echo 1 > /sys/bus/pci/devices/0000:03:00.0/remove
echo 1 > /sys/bus/pci/devices/0000:03:00.1/remove

# Print "Suspending..." message
echo "Suspending..."

# Set the system to wake up after 4 seconds
rtcwake -m no -s 4

# Suspend the system
systemctl suspend

# Wait for 5 seconds to ensure system wakes up properly
sleep 5s

# Rescan the PCI bus
echo 1 > /sys/bus/pci/rescan

# Print "Reset done" message
echo "Reset done"

Thank you

1 Like

Hey guys, this can be set to solve i solved it.

i wrote my findings in another post withsomeone who had the same issue: https://forums.linuxmint.com/viewtopic.php?p=2505517#p2505517

i found a guide that helped me out a ton:

what i did:

in terminal : sudo nano /etc/initramfs-tools/scripts/init-top/vfio.sh

#!/bin/sh

PREREQ=""

prereqs()
{
   echo "$PREREQ"
}

case $1 in
prereqs)
   prereqs
   exit 0
   ;;
esac

for dev in 0000:0c:00.0 0000:0c:00.1 
do 
 echo "vfio-pci" > /sys/bus/pci/devices/$dev/driver_override 
 echo "$dev" > /sys/bus/pci/drivers/vfio-pci/bind 
done

exit 0

change the “for dev in 0000:0c:00.0 0000:0c:00.1” for your own PCI Bus.

also what else i added:

mokura@pro-gamer:~$ cat /etc/modules-load.d/vfio-pci.conf
vfio-pci
mokura@pro-gamer:~$ cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=1002:744c
softdep radeon pre: vfio-pci
softdep amdgpu pre: vfio-pci
softdep efifb pre: vfio-pci
softdep drm pre: vfio-pci


mokura@pro-gamer:~$ cat /etc/modules
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
mokura@pro-gamer:~$ 

your /etc/default/grub should have your IDs aswell:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt vfio-pc.ids=1002:744c splash"
where 1002:744c is the ID for the gpu you want to passthrough.

2 Likes