Arch having issues with nvidia or nvidia-dkms or nvidia-open

I build a new PC. So far I had an old rx580 in that system as I was saving for the GPU. Everything was working great.

Now I bought the 5090 and of course I have had multiple issues so far.

If I install the nvidia or nvidia-dkms packages I only get one of the displayport output for my two monitors to work and the system is unable to start hyprland. If I install nvidia-open both monitors work and I even can write this topic. But nvidia-open seems a little bit shady to me as it somehow only has 7 MB.

If I do certain thing the display output crashes. gave the journalctl output to the AI and is says the messages indicate that the GPU fell of the bus. Which I hope it didn’t as this could explain maybe some issues.

What drivers do you guys recommend.

current Kernel

uname -r
6.15.4-arch2-1

GPU

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64                 Driver Version: 575.64         CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5090        Off |   00000000:01:00.0  On |                  N/A |
|  0%   60C    P5             71W /  575W |    1050MiB /  32607MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Isn’t nvidia-open the correct driver these days?

Nvidia-open is just the kernel module. You still need nvidia-utils for the userspace drivers, along with linux-firmware for the, well, firmware. Those two pieces are the ones that make for the bulk of the stuff, the kernel module is just a small part of it.

Does that happen on the nvidia-open drivers? Can you paste the output of both journalctl and dmesg as well?

I’ve been using nvidia-open for years.
FWIW, Blackwell is not supported on any driver other than nvidia-open (well, I guess nouveau can boot it, but that’d be useless for the most part).

1 Like

I’d try using nvidia-dkms with Secure Boot turned off. nvidia-open is okay but not very stable yet. Also, make sure the GPU is plugged in properly that “fell off the bus” error can mean power or connection issues too. What Linux distro are you using?

Blackwell is not supported by the previous proprietary driver.

EDIT: source:

For cutting-edge platforms such as NVIDIA Grace Hopper or NVIDIA Blackwell, you must use the open-source GPU kernel modules. The proprietary drivers are unsupported on these platforms.

2 Likes

I got a little bit impatient, as I have to return (rma) the 5090 in case it isn’t working properly within this week. So I decided to up my testing game a little bit.

Of course I had to buy a second 5090. The same model. Still got the same errors.

What happens is, as soon as I put load on the GPU graphics crashes. The computer itself seems to work on. Its not consistent, though sometimes it hangs.

In that case I thought. Well how is it on windows? My previous windows PC which I replaced, still has windows on its hard drive. I simply swapped the drive and tested everything again. With the new card and the newest nvidia drivers installed.

And as assumed the exactly same issue shows. I start for example The Witcher 3 and the PC crashes.

current suspects:

CPU
PCIe Riser card
Motherboard
RAM
PSU

As I don’t have a second PSU that’s powerful enough to test the GPU I can’t escape this component. But if this is the only denominator it should be fine as I have two 5090s.

I will try testing it with the old PC once again with windows and linux and both cards and let you guys know how it went.

I tryed that as well, was unstable. Also the second monitor didn’t work.

Don’t happen to be trying g-sync on monitors? Turn down the refresh and disable g-sync as I believe there is a known bug.

Oh, if you have a riser, try manually configuring your PCIe gen within your mobo to 4.0 or even 3.0 to see if it helps.

You could try power limiting the GPU to 400W and seeing if that issue still happens.

I removed the old windows hard drive from my new system and did returned it to the old Intel i5 System. Drivers were immediately recognized and both cards worked like a charm.

Witcher 3 RT Ultra presets were no issue at all.

I even performed a dump analysis using the AI overlord

The crash dump analysis reveals a VIDEO_TDR_FAILURE (0x116), 
which indicates the display driver stopped responding and failed to recover. 
WinDbg directly points to nvlddmkm.sys, the NVIDIA graphics kernel driver, 
as the module that failed. This strongly suggests an issue with your NVIDIA 
graphics card or its driver, but underlying hardware problems like an 
incompatible PCIe riser cable, unstable RAM, or motherboard issues could 
be the root cause. The primary troubleshooting step should be a clean installation 
of the latest NVIDIA drivers, followed by investigating potential hardware conflicts.

Will test both cards with risers on the i5 System.

Sounds like you’re using the same cables in both test systems? Any chance the cable you’re using has pin 20 connected (out of spec cable)? That can cause all kinds of spooky problems.

I had to wait for a new riser cable to be delivered. Both cables that I had tested the 5090 initially must have been either poor quality or PCIe3.0 or something like that. With the PCIe4.0 Cable it worked without an issue. I assumed as much before but as I had performed the previous test on the old windows system. However I had the new AM5 CPU already converted to liquid cooling and the 5090 wasn’t fitting the case, so I wasn’t able to test it without a riser.

I have the waterblock for the GPU already ordered but it hasn’t arrived yet. Seems like there are more cards than blocks currently available.

Thanks for everyone’s help.