Dracut configuration is rendering nvidia drivers useless

I’m on fedora 39 and this dracut configuration is breaking my host nvidia drivers

add_drivers+=" vfio vfio_iommu_type1 vfio_pci "

When this is added to dracut and initramfs are rebuilt, whenever I restart the computer, I get an error on startup that says “nvidia kernel module missing, falling back to noveau”, it doesn’t actually fall back to noveau since I have it blacklisted and it ends up binding to the vfio whenever I check with lspci

Note that vfio works completely fine, HOWEVER when I try to remove the vfio drivers and add my nvidia drivers back in for gaming on the host, it works as in I get no errors with my commands however I cannot use the gpu for any gpu accelerated tasks whatsoever.

This is how I am switching from VFIO to nvidia

sudo virsh nodedev-reattach pci_0000_01_00_0
sudo rmmod vfio_pci vfio_pci_core vfio_iommu_type1
sudo modprobe -i nvidia_modeset nvidia_uvm nvidia

But as soon as I remove the dracut configuration and reboot my gpu works for all games like nothing happened

Any ideas what’s going wrong?

Quick sanity check, do you use nvidia akmod drivers and is nvidia kernel module actually built and available? This might have not be related to dracut at all.

nvidia kernel module missing, falling back to noveau

This seem on surface “I have problem A, but trying to solve unrelated B”.

Kernel updates have done this to me on fedora few times, I think it was missing matching kernel-devel-X.Y.Z package that led to silently failing akmod build for newer kernel.

There were also issues with akmod build for certain vdrvier version an kernel versions. Check that also.

Next boot from old working no newer one led to exact the same scenario.

However I dont use iommu and similar, so if it actually is related, I have no other input.

TLDR:

  • if it says module missing, check that first. It can happen
  • check if older kernels have the same issue with missing module
  • do distro sync and force install all kernel packages
  • re-enable nouveau for debugging or use integrated gpu if able

Absolutely, first thing I did when I installed is updated everything including the kernel, boot into the fresh kernel, install the proprietary nvidia drivers and let it finish building, I verify that it’s finished building with modinfo -F version nvidia and see that I get an output.

Then I restart and install a game that is gpu accelerated and successfully play it using my Nvidia gpu. Then I update my grub configuration for vfio and restart, still no issues.

But as soon as I rebuild the initramfs that dreaded error comes straight back and trying to use my gpu on my host machine is impossible as drivers seem to be broken, then I uninstall the dracut config, update the initramfs and the error magically goes away and I’m back to gaming on my dgpu.

I believe it’s correlated, I have no clue what to do from here

Whoops, sorry, I somehow totally missed this kinda important part of your post

Note that vfio works completely fine, HOWEVER when I try to remove the vfio drivers and add my nvidia drivers back in for gaming on the host, it works as in I get no errors with my commands however I cannot use the gpu for any gpu accelerated tasks whatsoever.

| will let someone less bad at reading answer the rest.

Yeah the TLDR of my post is:

“Dracut configuration is messing up my hosts Nvidia drivers so when I switch from VFIO → Nvidia I can’t use the gpu anymore”

I should’ve made it more clear my apologies

A few thoughts.

  • Maybe its as simple as dracut not packaging the nvidia driver with your custom config. Did you try adding “nvidia” to the line of vfio* modules?
  • You didn’t explain what you try to accomplish with the new dracut configuration (I can make an educated guess, but it would be better if you explained it). It’s possible that your config change causes dracut/fedora to use the vfio driver with the NVidia GPU, leaving you with no graphics output at boot. It this is what you want, do you have a second GPU that can output? How do you ensure that vfio is leaving one GPU alone and that this is the correct (i)GPU?