NVIDIA 1080TI clock difference causes hang

Not sure if this is a hardware bug or nvidia drivers but I noticed if the CPU or Memory clocks differ by too much, one will eventually start to studder and outright hang. Has anyone else noticed this?

In my case I have atm 2 EVGA 1080TI’s that are not in SLI. Normally one is used in a VM but haven’t lately… which is when I started noticing the hangs.

As for logs they aren’t really helpful at least to me.

Syslog:

07:07:16 localhost kernel: nvidia-modeset: ERROR: GPU:0: Idling display engine timed out: 0x0000927c:0:0

X:

Xorg.0.log:[530314.307] (--) NVIDIA(GPU-0): DFP-7: 165.0 MHz maximum pixel clock
Xorg.0.log:[530314.307] (--) NVIDIA(GPU-0): 
Xorg.0.log:[541079.048] (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x00002a50, 0x00
005068)
Xorg.0.log:[541084.206] (EE) NVIDIA(GPU-0): WAIT (0, 8, 0x8000, 0x00005068, 0x00
005068)
Xorg.0.log:[596458.307] (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x00008598, 0x00
00adb8)
Xorg.0.log:[596465.307] (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x00008598, 0x00
00adb8)
Xorg.0.log:[596755.678] (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x00009148, 0x00
00c074)
Xorg.0.log:[596762.678] (EE) NVIDIA(GPU-0): WAIT (1, 8, 0x8000, 0x00009148, 0x00
00c074)
Xorg.0.log:[598210.796] (EE) NVIDIA(GPU-0): WAIT (2, 8, 0x8000, 0x000008a4, 0x00
001f5c)
Xorg.0.log:[598217.411] (EE) NVIDIA(GPU-0): WAIT (0, 8, 0x8000, 0x00001f5c, 0x00
001f5c)

etc. There are some references to those errors online but nothing really conclusive aside from something went wrong with the GPU, thus the cascade of errors bubbling up.

Does any of this ring a bell to anyone else? The hangs go away if the clocks are set the same on both cards. Funny thing is because one card can be in a lower power / performance state and cause it so it’s not something specific to overclocking rather the clocks themselves.

396.54.09 driver

Seems like a power saving issue. You need to force the cards into performance mode on boot.

My GPU hangs have more to do with VLC and repeating a video file that has a resolution above 720p. They hang for 15 seconds and no one has found the root cause. I had to switch to an older version of mpv so that it doesn’t hang. Newer mpv builds also hang BADLY to where the system fully locks up playing a 4K file.

Don’t know if they are the same issue but I can confirm that has indeed happen here too. Seeking in mpv tends to trigger it more often.

One thing that also happens is the card(s) become completely unresponsive even to nvidia-smi and will eventually return results given sufficient time. That would make me think they are softlocked internally.

There’s also nothing in the logs about them falling off the pcie bus so they aren’t resetting either. Nor are thermals and power a problem.

If I had to guess I’d say there are a couple issues related to the clock. Id imagine the fact that nvidia-smi hangs would also be affecting other internal driver functions. Two cards NOT in SLI should be capable of semi isolation from each other.