Linux Kernel 6 seems to be incompatible with the vfio_pci module needed for PCI passthrough

Thanks for that!

I think the big difference between our setups is how / when we’re binding the PCI device to the vfio-pci module. I use a conf file in /etc/modprobe.d:

options vfio_pci ids=10de:1b06,10de:10ef

I’ve also successfully specified the IDs via kernel parameter (not at the same time) but that leads to the same broken framebuffer…

That said, I’d never seen that vfio-pci-override.sh file option, but a little internet sleuthing lead me to this script:

#!/bin/sh

DEVS="0000:00:00.0 0000:00:00.1..."

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
    for DEV in $DEVS; do
        echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
    done
fi
modprobe -i vfio-pci

I’ll give this approach a quick try later today to see if it changes anything on my end, but I’m not sure it’ll work with an encrypted drive as that location won’t exist until I unlock it.

@TieMajor && @SkaiiNyght - maybe this approach can provide a fix? Note: this looks to use the pci id, rather than the device id. You can see both ids with lspci -nn

if I remember correctly, with the change from kernel 5.14 to 5.15, the following parameter was necessary with my GPU “SoftDEP AMDGPU Pre: Vfio vfio_pci VFIO-PCI”, without that, my system is frozen when loading KDE.

cat /etc/modprobe.d/vfio.conf  
install vfio-pci /usr/local/bin/vfio-pci-override.sh
softdep amdgpu pre: vfio vfio_pci vfio-pci
softdep xhci_pci pre: vfio vfio_pci vfio-pci
softdep pcieport pre: vfio vfio_pci vfio-pci
1 Like

@retox I did a reinstall of arch, for other reasons, and everything is working now. Not sure if there was an update in the time that I was busy reinstalling, or if the fresh start worked

Edit.

Nevermind, I was biting from the lts kernel. Zen and vanilla still broken by way of the arch wiki on pci passthrough via ovmf

This is the approach that I actually was using previously as my two GPUS share an id for their sound portion. Still no dice. If I specify the PCI Id’s I get a failure to boot, although for a different reason.

With it working for @Janos, that looks like the approach that I had to do when I ran debian. Maybe I’ll give that a try just with nvidia instead of AMD.

I finally got around to trying out the vfio-pci-override script. I added my device ids and placed it in the boot directory, since I suspected that it tries to read the file before the booting??

Funny enough - nothing happened after adding the file “hook” in my mkinitcpio.conf file. Looking at @Janos’s vfio.conf modprobe entry, I added the relevant lines there. Once that was added, I got the same behavior as before - a broken frame buffer, but still able to enter in my encrypt pwd and boot / bind normally afterwards.

I reverted back to what I had before. BUT someone updated the Arch forum cross-post of this issue saying they added the relevant driver BEFORE the vfio hooks. I’m going to give that shot real quick…

UPDATE - that worked. I added “amdgpu” before the vfio hooks and the prompt came back (after a second long black screen as it switched to using the igpu). Seems like a decent workaround for the time being

1 Like

I’m on manjaro too and I’m experiencing the same issue; the system in real doesn’t hang, but the vfio module seems to mess with video framebuffers.
I have some issue with usb and kernel 6.0 too.
Some work to be done for vfio and 6.0.
More info here:

2 Likes

Well, at least now I can get into my system knowing that just because I don’t see the screen progressing it is still actually doing something. Unlocking the system, logging in, and starting x are all done blind, but hey, it works. I would imagine a display manager would take out the logging in and starting x bind issue away, but If I’m already unlocking the encrypted drives it isn’t much more difficult to just keep doing everything else.

Is this still a known issue in 6.x? I had issues when 6 first came out, and simply avoided it, going back to 5.19. I thought at the time it was an issue with openzfs.
Yesterday I had the time to try again and discovered it was the vfio kernel modules causing the issue, web search brought me here, a thread 5 months old now.

I have managed to workaround this issue and posted the solution on my blog, hope this helps. hardcoded.info/post/2023/03/04/pci-passthrough-using-vfio-pci-for-linux-kernel-version-6-solution/

1 Like

im sorry but ur workaround is super jank and is super hard to pull off

Although I have not applied the fix since I’m using 5.15.x kernel, I don’t know why you are saying this…It’s well described in every step together with simple explanations.

I do not understand what is going on? Are all of you using some obscure kernel version? I never had a problem with vfio-pci, neither on the regular kernel build nor on the hardened kernel build. Maybe you should revisit what you are doing, since I don’t think this is a current problem or ever has been since it is working for the majority of people.

Can anyone try ‘’’modprobe vfio-pci’’’ on Linux 6.x without desktop environment? As soon as I run that, I loose main screen output.

Works fine for me.

What’s your platform and the gpu model, and Linux version?

Arch, kernel 6.1.29, AMD RX550.

I am on Ubuntu 23.04. If I boot it with Linux 6.2.0-20, vfio-pci would not be loaded on boot. If I try to manually load it, it trashes my primary framebuffer, while gpu passthrough works as normal. I just can’t get any screen output from the host igpu.

If I boot it with Linux 5.19.0-40, everything works properly.

Thus, I believe it is either a bug in the Linux kernel, or in this Ubuntu distribution that has compatibility issue.

Unfortunately I have currently no means to test Ubuntu on my end, sorry.

Sounds a bit as if when vfio-pci is loaded it tries to bind something that is in the same IOMMU group as your iGPU. Have you set vfio-pci to bind anything at all at this stage, in the kernel parameters or somewhere else, or are you just trying to load the module without binding anything? Apart from that I can not really help further, passthrough with iGPUs seems like trouble, since I have heart from multiple people now that they run into problems with it.

I use boot parameter to set vfio ids.
I use igpu (7950x) for the host, Rtx 4070 ti for the vm. I can verify the vfio-pci does not grab the igpu when loaded, but it did unbind the igpu framebuffer somehow.

Then remove the boot parameter and boot without any passthrough devices and try to load the module. This way you can see if there is a problem with the module or with your passthrough setup.