Nvidia + Fedora 38 upgrade = Epic Fail

This evening, after waiting for others to work out the inevitable rough edges, I went ahead and upgraded my Ryzen 3700 + Geforce 2070 Super system from Fedora 37 to 38. To keep a long story short, despite sacrificing a few goats and muttering a few incantations, I got a black screen and never could get to the desktop. Or even a console for frick’s sake (yes tried single user mode and recovery mode.)

Finally said screw it, and popped in my old AMD R9 Fury. Boom, straight to desktop. Despite the ease of use, the Fury is an ancient card and way behind the 2070 in terms of performance. Any suggestions for fixing this? Other than sending Mr. Jensen a nastygram, that is…

I was stunned until I remembered that I had SecureBoot enabled. That prevented the nvidia module from being loaded.
After I disabled SecureBoot Fedora 38 +nvidia came right up.

Assuming this is your issue you can follow the rpmfusion howto on enabling SecureBoot with akmods, which is used to compile nvidia driver for new kernel versions.

2 Likes

Also, the latest nvidia driver from nvidia cuda repo is only 530, and they don’t technically have a fedora 38 repo yet. So if you need to use cuda you will be stuck on 530, and cannot use the 6.3 kernel as you need 535.

Yeah I’m pretty much dead in the water too until the cuda repos ship something for f38 since I need cuda devel packages :confused:

If you use the run file for nvidia to install 535 you should be okay but I have not tested this personally.

The rpmfusion cuda howto acknowledges this and provides workarounds in the KNOWN ISSUES section.

1 Like

@jode Yep, turning off Secure Boot was one of the ‘incantations’ I already tried. It’s worked before, but not this time.

@Dynamic_Gravity I do believe that I have installed CUDA at some point in the past, so I searched for ‘cuda’ (‘dnf search cuda’) and just removed any drivers or packages that it found. I believe there were only three or so, but removing them didn’t solve my problem. Not sure what else to try, except maybe reinstalling the RPM Fusion nvidia packages in case removing CUDA broke something.

Boot into command line mode

  • On grub prompt, type e to edit the grub entry you’re on (typically the default/top entry)
  • Use arrow down key to navigate to the line that starts with linux ...
  • Use “End” key or arrow-right key to navigate to the end of what can be a very long line (potentially wrapping multiple times)
  • Type " 3" (space-three)
  • (optional, but can be helpful:) use arrow keys to navigate to and delete/backspace to eliminate keywords “rhgb” and “quiet” which limit text output while booting
  • Hit Ctrl-x on the keyboard to boot with the modified parameters
  • The boot will (should) end in a terminal with a login prompt.
  • Use “lsmod | grep nvidia” or “lsmod | grep nouv” to figure out which driver is currently loaded
  • Review dmesg/journal/log files to identify what may have gone wrong during install/boot.

I assume that following the removal steps from the rpmfusion Nvidia howto followed by a new installation from the same document may fix the issue.

You may have to undo the boot parameter modifications. You can do this in grub, or more easily by finding the correct boot entry in /boot/loader/entries and using your favorite command line editor.

FYI. Driver version 535.54.03 was just pushed by rpmfusion.

1 Like

Thanks! I didn’t know about that. I’ve been living under a rock.

I have had very similar-sounding problems. Frustrating, isn’t it? Every upgrade I hold my breath and cross my fingers.

Anyway, for me at least, 530 from rpmfusion-nonfree and cuda are working at the moment.

my latest fix

$ dnf list installed | grep nvidia
akmod-nvidia.x86_64                                  3:530.41.03-1.fc38                  @rpmfusion-nonfree                             
kmod-nvidia-6.3.8-200.fc38.x86_64.x86_64             3:530.41.03-1.fc38                  @@commandline                                  
nvidia-persistenced.x86_64                           3:530.41.03-1.fc38                  @rpmfusion-nonfree                             
nvidia-settings.x86_64                               3:530.41.03-1.fc38                  @rpmfusion-nonfree                             
xorg-x11-drv-nvidia.x86_64                           3:530.41.03-1.fc38                  @rpmfusion-nonfree                             
xorg-x11-drv-nvidia-cuda.x86_64                      3:530.41.03-1.fc38                  @rpmfusion-nonfree                             
xorg-x11-drv-nvidia-cuda-libs.x86_64                 3:530.41.03-1.fc38                  @rpmfusion-nonfree                             
xorg-x11-drv-nvidia-kmodsrc.x86_64                   3:530.41.03-1.fc38                  @rpmfusion-nonfree                             
xorg-x11-drv-nvidia-libs.x86_64                      3:530.41.03-1.fc38                  @rpmfusion-nonfree                             
xorg-x11-drv-nvidia-power.x86_64                     3:530.41.03-1.fc38                  @rpmfusion-nonfree  
OS: Fedora Linux 38 (KDE Plasma) x86_64 
Kernel: 6.3.8-200.fc38.x86_64 
DE: Plasma 5.27.5 
WM: kwin 
Theme: [Plasma], Breeze [GTK2/3] 
Icons: [Plasma], breeze-dark [GTK2/3] 
CPU: AMD Ryzen 9 7900X (24) @ 4.700GHz 
GPU: NVIDIA GeForce RTX 4090 
Memory: 2847MiB / 31811MiB 

1 Like

Thanks for the tips. This is kind of weird; adding a " 3" to the grub line for the F38 kernel still dumped me to a black screen. However adding a 3 to the still-present F37 grub boot line did get me to a command prompt. That leaves me with the question, can I try to troubleshoot and reinstall things with the old kernel (and old modules?), or is that just going to mess things up worse?

(Edit: Is this putting the system in run level 3? Like ‘telinit 3’?)

Yes.

Look for /boot/initramfs-6.3.*.fc38.x86_64.img. Your boot may have failed due to a missing initramfs. That happened to me on every machine I upgraded Fedora from 37 to 38. If this is the case, the solution is to create an initfs for your fedora 38 kernel version. e.g. dracut /boot/initramfs-6.3.8-200.fc38.x86_64.img 6.3.8-200.fc38.x86_64

Yes.

I don’t think that’s the case, because that line did get me to a desktop when I temporarily swapped out my 2070 for an old AMD R9 Fury. Unless there’s some kind of failsafe for the initrd that works when an AMD card is installed…?

Need to address a few other things this afternoon, so I probably won’t have a chance to try and troubleshoot from F37 for a while. Which is another question; would it be ‘safer’ to drop in the Fury, rip out the Nvidia drivers, nouveau blacklist, etc., put the 2070 back in, and start over fresh?

EDIT: Actually, since I have a separate root and /home partition, could I just wipe F37 from root, and do a fresh install?

I don’t think either approach is “unsafe”. The trick with booting into runlevel 3 should allow you to work on the system in case the boot works in general, but somehow the nvidia driver fails to load. A swap to AMD graphics will do the same.

However, you report that for some reason the boot into the f38 kernel fails.

In my case eliminating keywords “rhgb” and “quiet” from the kernel boot process reveiled to me a missing initfs. The same move may reveil a valuable error message to you as well :slight_smile:

It seems that SELinux is complicating matters. On F37 it was disabled, with an selinux=0 parameter in the kernel boot line. If I try to boot F38 with selinux disabled (the default) and rhgb and quiet removed, I see a very brief message about the nvidia fallback service trying to start, then a black screen. If I deliberately enable SELinux (selinux=1), I get a message about a permission error, and the API filesystem failing to load.

If I haven’t mentioned it before, as a desktop user I hate SELinux and almost always disable it. Many desktop applications don’t play by SELinux’s rules, and spew errors or won’t even run with it installed.

Follow the documentation to disable SELinux: Changing SELinux states and modes :: Fedora Docs

2 Likes

@jode thanks for posting that guide.

I followed it and got F38 to work with the v535 drivers from the rawhide repository.

I had to to a full uninstall of of my current nvidia-driver and cuda packages, and disable their repos in my yum.repos.d directory, and reboot, reinstall from rpmfusion, and then pinned them with the versionlock file.

I’ll keep my eye out for once the F38 cuda repo ships but I should be okay for now.

Bonus that wayland works now, so now compositing is butter smooth.

3 Likes

I tried to remove the nvidia drivers with:

dnf erase xorg-x11-drv-nvidia* akmod-nvidia

That didn’t help, to put it mildly. Now I can’t even boot with the F37 kernel. So I thought, ok, screw it, I’ll just reinstall. Downloaded F38, ran the F38 media installer, rebooted and got a media error at 5%. :man_facepalming:

Time for plan B. I will install Nobara 38 (Fedora spin by Glorious Eggroll) to the root partition, then doctor /etc/fstab to point to the old /home partition. Done it before, but that’s no guarantee…

Hope you addressed the media error ahead of this.

Did it boot to black screen or just stop somewhere else? Did you turn off “quiet” in grub and get rid of all the nouveau blacklists?

Did a ‘bad blocks’ check with Rufus and it insists my media is fine. And Nobara 38 also failed the same media check. It appears to be this bug/non-bug:

https://bugzilla.redhat.com/show_bug.cgi?id=1282244

Best I can tell, it’s because Windows ends up writing some additional files to the drive after creation, which FUBARS the md5 check the installer does at boot-up. Apparently the solution is to skip the media check and just install.

So having done that, and just tried to boot Nobara 38 off the USB stick sans media check…it still won’t initialize the #$*#(@$# Nvidia card. After a very brief scroll of diagnostic info, I get a flickering black screen, the tell-tale sign that X keeps trying, and repeatedly failing, to initialize the GPU.