System
Asrock X470 Taichi, Ryzen 3900X, 2x16 GB 3200 Mhz RAM, EVGA 1000 G3 PSU, dual boot of Windows 10 & Ubuntu 19.10.
GPUs are 1080 Ti from EVGA: SC2 & Black edition models.
Only other PCIE device is an EVGA Nu Audio card.
Symptom
The video out of the GPU will stop working as soon as the desktop appears (or would appear) after booting, irrespective of O/S. The only way to recover is to hard reset or power off (Windows) or use the magic SysRq keys (Linux).
When it happens
At its worst, as soon as the desktop appears. But it does not start out this severe — it progressively worsens. Prior to this level of severity, the desktop will randomly lose the video out after having run for some time forcing a hard reset / shutdown. Then the crashes occur more frequently, becoming common. System logs indicate a DWM crash in Windows.
What it happens with
This occurs with different known working 1080 Tis that work fine in other systems. That is, take them out of a good system, put them in the problematic system, and then after a few days, the problem starts appearing, and gets progressively worse until the desktop never displays because the GPU crashes so quickly.
No device in the system is overclocked (not RAM, CPU, GPU, or GPU RAM).
The first time this occurred, I assumed it was the GPU. I took the GPU out of the problematic system and put it into another known working system, and the GPU did output to the display, but the image was scrambled. So I took out the GPU and put it to one side for a week. Then when another GPU started having the exact same problem, I realized it must be something in the system itself that was causing it. When I retested in the problematic system the original GPU that first showed the problems, it worked just fine. . . . until today, when it crashed at random again.
How to troubleshoot?
What I have tried: downgrading the Asrock BIOS to from 1.0.0.4 Patch B to 1.0.0.3 ABB. It makes no difference. I’m farily certain it’s not a GPU driver issue, given how it happens under different driver versions (Windows / Linux drivers are different versions) and the GPUs are fine in other systems with the same driver.
Causes I can think of:
- Power supply unit itself delivering bad power output
- PSU cable
- Motherboard
- The PCIE audio card somehow causing electrical problems or incompatibilities on the PCIE bus (?)
Is there anything else I need to think of to troubleshoot this? Given the symptoms, is any one of these causes to be more likely? Is it probably an electrical problem? Is it possible that in the cause of troubleshooting the problem, the GPU could be permanently damaged if I don’t figure it out quickly enough?
Thanks in advance.