CPU: Threadripper 3960x
Mobo: TRX40 Gigabyte Aorus Pro Wifi
GPU: Gigabyte 3090 Gaming OC
RAM: 256GB (8x32GB) G.Skill Ripjaw V 3600 (F4-3600C18D-64GVK)
PSU: 1000W Corsair RM1000x 80 Plus Gold
UPS: Cyberpower 1500VA/1000W
OS: Windows 10 Pro
The fast few months I’ve been having these mystery crashes.
The timing of the crashes is unpredictable, sometimes it’s during a game, sometimes when rendering in Blender or editing in Resolve, sometimes just watching Youtube.
Up until now they’ve been sparse, once every couple weeks. But yesterday it reached a critical mass, with over a dozen black screen shutdowns as I tried to troubleshoot. I’ve been doing my best to track down the source, but I’m hoping someone can offer any insights that come to mind.
Potential clues:
-It rarely Bluescreens (when it does, it always gives the code ‘IRQL not less or equal’). But more often it’s a total black screen crash.
-Almost every time it black screens, the BIOS resets, requiring me to redo my fan and RAM speed settings.
-Occasionally the computer has posted with major graphical artifacts, dancing pixels, colors, etc. However, these artifacts are ONLY present in the BIOS. When it boots into Windows, the artifacts dissapear.
-Several times, the Windows Event Viewer has shown “Event ID 14, nvlddmkm”, occuring right before a crash. This seems to be related to an NVIDIA driver.
-When it crashes, the RGB on the motherboard and GPU remain on, but the system and power button is otherwise completely unresponsive, and I need to cut the power in order to restart.
-The motherboard Status LED for DRAM has lit up in several instances. However I also get a VGA status light sometimes.
Thoughts:
I’ve gone back and forth between suspecting the RAM, GPU, and motherboard.
The IRQL error seems to suggest a memory issue. It’s worth noting that for some reason, my system was reason not able to run the RAM at the default XMP profile of 3600 MT/s, and I had to switch to manual settings and drop down 3533. For the timings and voltage, I kept the XMP suggested settings.
The graphical artifacts and Event Viewer logs however point to a GPU issue. I’ve sent this card in once before when it completely died. Gigabyte repaired and returned it and I haven’t had any problems with it for over two years.
On the other hand, the BIOS resetting feels like a motherboard problem.
What I’ve tried, to no effect:
Changing CMOS battery
Updating the BIOS (previously FA, updated to FD)
Reseating the GPU
Updating GPU drivers
Reseating the RAM
Dropping the RAM further down to 3200.
Running memtest86+ on all the DIMMs individually. At one point, it actually crashed during the memtest!
As a final measure, I’ve swapped the 3090 out with my old GTX 1080. So far I’ve had no crashes or other irregularities, but I’m going to give it a couple days of regular use to put it to the test.
Thanks for sticking with this post so far, any input or thoughts would be very much appreciated.