Linux Kernel 6 seems to be incompatible with the vfio_pci module needed for PCI passthrough

I’ve upgraded to the arch zen 6.0.2 kernel and got stuck during boot. Turned off quiet boot and discovered that it kept getting stuck just after the message ‘::Running early hooks [udev]’ would display.

The modules that I had loaded as early loading were all related to VFIO - MODULES=( vfio_pci vfio vfio_iommu_type1 vfio_virqfd )

I’ve gone through each of the individual modules and the only one that is causing a boot failure is the vfio_pci module. I can’t find any breaking changes that would suggest this should no longer be working.

I have tried the vanilla arch kernel as well to no avail, however the lts kernel is working which leads me to believe that it is something related to kernel version 6 that is the issue. Is anyone else having this issue or know of a workaround for the vfio_pci module? I’m booting with vfio, vfio_iommu_type1 and vfio_virqfd just fine, but it seems the vfio_pci module is absolutely needed to run my passthrough GPU with the vfio driver.

I should probably say that this was all working just fine with 5.19.x

5 Likes

I’m seeing the same thing on my end, but haven’t traced it to the source yet…

That said, it’s not 100% broken for me. I have a the encrypt hook (for my luks volume) and while you don’t see the unlock prompt, I can still wait a second and enter my pwd. Everything works as expected afterwards.

This all happens too early to show up in the journal, so if anyone has a hint to further log and debug this - I’d appreciate it.

2 Likes

I am encountering the same issues after updating to linux-vfio on my arch machine. It seems the framebuffer is broken (start and also at shutdown). But when the xorg or wayland session start everything is working fine. I have kernel logs for both my old kernel (functionnal) and new boots. I tried to look for diffs but nothing jumped to me. If someone wants to take a look let me know and I will post them here.

I did some little troubleshooting and I found the issue does not appear if I do not specify pcie ids to isolate in the kernel parameters

3 Likes

I am on Manjaro 6.0.2-2, no issues at all

lsmod | grep vfio                                                                                                                                                                                                                                                                                            ✔ 
vfio_pci               16384  0
vfio_pci_core          77824  1 vfio_pci
irqbypass              16384  2 vfio_pci_core,kvm
vfio_virqfd            16384  1 vfio_pci_core
vfio_iommu_type1       45056  0
vfio                   32768  2 vfio_pci_core,vfio_iommu_type1
1 Like

Interesting - what modules and hooks do you have in your mkinitcpio.conf file? Or are you enabling the modules early via kernel parameters?

Note: I have the same output from lsmod once things are up and running. The issue is during the initial boot.

cat /etc/mkinitcpio.conf
# vim:set ft=sh
# MODULES
# The following modules are loaded before any boot hooks are
# run.  Advanced users may wish to specify all system modules
# in this array.  For instance:
#     MODULES=(piix ide_disk reiserfs)
#MODULES=""
MODULES="vfio_pci vfio vfio_iommu_type1 vfio_virqfd"
# BINARIES
# This setting includes any additional binaries a given user may
# wish into the CPIO image.  This is run last, so it may be used to
# override the actual binaries included by a given hook
# BINARIES are dependency parsed, so you may safely ignore libraries
BINARIES=()

# FILES
# This setting is similar to BINARIES above, however, files are added
# as-is and are not parsed in any way.  This is useful for config files.
#FILES=""
FILES="/usr/local/bin/vfio-pci-override.sh"

# HOOKS
# This is the most important setting in this file.  The HOOKS control the
# modules and scripts added to the image, and what happens at boot time.
# Order is important, and it is recommended that you do not change the
# order in which HOOKS are added.  Run 'mkinitcpio -H <hook name>' for
# help on a given hook.
# 'base' is _required_ unless you know precisely what you are doing.
# 'udev' is _required_ in order to automatically load modules
# 'filesystems' is _required_ unless you specify your fs modules in MODULES
# Examples:
##   This setup specifies all modules in the MODULES setting above.
##   No raid, lvm2, or encrypted root is needed.
#    HOOKS=(base)
#
##   This setup will autodetect all modules for your system and should
##   work as a sane default
#    HOOKS=(base udev autodetect block filesystems)
#
##   This setup will generate a 'full' image which supports most systems.
##   No autodetection is done.
#    HOOKS=(base udev block filesystems)
#
##   This setup assembles a pata mdadm array with an encrypted root FS.
##   Note: See 'mkinitcpio -H mdadm' for more information on raid devices.
#    HOOKS=(base udev block mdadm encrypt filesystems)
#
##   This setup loads an lvm2 volume group on a usb device.
#    HOOKS=(base udev block lvm2 filesystems)
#
##   NOTE: If you have /usr on a separate partition, you MUST include the
#    usr, fsck and shutdown hooks.
HOOKS="base udev autodetect modconf block keyboard keymap consolefont filesystems fsck"

# COMPRESSION
# Use this to compress the initramfs image. By default, gzip compression
# is used. Use 'cat' to create an uncompressed image.
#COMPRESSION="gzip"
#COMPRESSION="bzip2"
#COMPRESSION="lzma"
#COMPRESSION="xz"
#COMPRESSION="lzop"
#COMPRESSION="lz4"
#COMPRESSION="zstd"

# COMPRESSION_OPTIONS
# Additional options for the compressor
#COMPRESSION_OPTIONS=()

GRUB_CMDLINE_LINUX_DEFAULT="iommu=pt amd_iommu=on amd_cpufreq=enable pcie_aspm=off vfio_iommu_type1.allow_unsafe_interrupts=1 mitigations=off default_hugepagesz=1G hugepagesz=1G udev.log_priority=3"

1 Like

Thanks for that!

I think the big difference between our setups is how / when we’re binding the PCI device to the vfio-pci module. I use a conf file in /etc/modprobe.d:

options vfio_pci ids=10de:1b06,10de:10ef

I’ve also successfully specified the IDs via kernel parameter (not at the same time) but that leads to the same broken framebuffer…

That said, I’d never seen that vfio-pci-override.sh file option, but a little internet sleuthing lead me to this script:

#!/bin/sh

DEVS="0000:00:00.0 0000:00:00.1..."

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
    for DEV in $DEVS; do
        echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
    done
fi
modprobe -i vfio-pci

I’ll give this approach a quick try later today to see if it changes anything on my end, but I’m not sure it’ll work with an encrypted drive as that location won’t exist until I unlock it.

@TieMajor && @SkaiiNyght - maybe this approach can provide a fix? Note: this looks to use the pci id, rather than the device id. You can see both ids with lspci -nn

if I remember correctly, with the change from kernel 5.14 to 5.15, the following parameter was necessary with my GPU “SoftDEP AMDGPU Pre: Vfio vfio_pci VFIO-PCI”, without that, my system is frozen when loading KDE.

cat /etc/modprobe.d/vfio.conf  
install vfio-pci /usr/local/bin/vfio-pci-override.sh
softdep amdgpu pre: vfio vfio_pci vfio-pci
softdep xhci_pci pre: vfio vfio_pci vfio-pci
softdep pcieport pre: vfio vfio_pci vfio-pci
1 Like

@retox I did a reinstall of arch, for other reasons, and everything is working now. Not sure if there was an update in the time that I was busy reinstalling, or if the fresh start worked

Edit.

Nevermind, I was biting from the lts kernel. Zen and vanilla still broken by way of the arch wiki on pci passthrough via ovmf

This is the approach that I actually was using previously as my two GPUS share an id for their sound portion. Still no dice. If I specify the PCI Id’s I get a failure to boot, although for a different reason.

With it working for @Janos, that looks like the approach that I had to do when I ran debian. Maybe I’ll give that a try just with nvidia instead of AMD.

I finally got around to trying out the vfio-pci-override script. I added my device ids and placed it in the boot directory, since I suspected that it tries to read the file before the booting??

Funny enough - nothing happened after adding the file “hook” in my mkinitcpio.conf file. Looking at @Janos’s vfio.conf modprobe entry, I added the relevant lines there. Once that was added, I got the same behavior as before - a broken frame buffer, but still able to enter in my encrypt pwd and boot / bind normally afterwards.

I reverted back to what I had before. BUT someone updated the Arch forum cross-post of this issue saying they added the relevant driver BEFORE the vfio hooks. I’m going to give that shot real quick…

UPDATE - that worked. I added “amdgpu” before the vfio hooks and the prompt came back (after a second long black screen as it switched to using the igpu). Seems like a decent workaround for the time being

1 Like

I’m on manjaro too and I’m experiencing the same issue; the system in real doesn’t hang, but the vfio module seems to mess with video framebuffers.
I have some issue with usb and kernel 6.0 too.
Some work to be done for vfio and 6.0.
More info here:

2 Likes

Well, at least now I can get into my system knowing that just because I don’t see the screen progressing it is still actually doing something. Unlocking the system, logging in, and starting x are all done blind, but hey, it works. I would imagine a display manager would take out the logging in and starting x bind issue away, but If I’m already unlocking the encrypted drives it isn’t much more difficult to just keep doing everything else.

Is this still a known issue in 6.x? I had issues when 6 first came out, and simply avoided it, going back to 5.19. I thought at the time it was an issue with openzfs.
Yesterday I had the time to try again and discovered it was the vfio kernel modules causing the issue, web search brought me here, a thread 5 months old now.

I have managed to workaround this issue and posted the solution on my blog, hope this helps. hardcoded.info/post/2023/03/04/pci-passthrough-using-vfio-pci-for-linux-kernel-version-6-solution/

1 Like

im sorry but ur workaround is super jank and is super hard to pull off

Although I have not applied the fix since I’m using 5.15.x kernel, I don’t know why you are saying this…It’s well described in every step together with simple explanations.

I do not understand what is going on? Are all of you using some obscure kernel version? I never had a problem with vfio-pci, neither on the regular kernel build nor on the hardened kernel build. Maybe you should revisit what you are doing, since I don’t think this is a current problem or ever has been since it is working for the majority of people.

Can anyone try ‘’’modprobe vfio-pci’’’ on Linux 6.x without desktop environment? As soon as I run that, I loose main screen output.

Works fine for me.