Hello everyone!
I am running into a strange issue with a new workstation I built for a friend. Here are the workstation’s specs:
AMD Ryzen Threadripper 7980X 64 Cores 128 Threads CPU
Gigabyte TRX50 Aero D STR5 Motherboard (REV 1.1)
G. SKILL Zeta R5 Neo 128GB (4 x 32GB) ECC Registered DDR5 6400 R-DIMM RAM
Noctua NH-U14S TR5-SP6 CPU Cooler
2X MSI Suprim Liquid X RTX 4090 GPU (Thinking about switching to one instead of two)
Be Quiet Dark Power Pro 13 1600W ATX 3.0 PSU
Crucial T700 Gen 5 NVME 4TB Drive – OS Drive Win11
Sabrent Rocket 4 Plus NVME 8TB Drive – Storage
Lian Li 011 Dynamic EVO XL PC Case
Before putting the system together, I confirmed that all of the components were compatible and did a soft test with the hardware outside the PC case. I tested the components, and everything was operational. I wrapped up the build and tackled the software portion, which included a BIOS update and the latest installation of Windows 11, and all the latest drivers for the workstation were installed.
At this point, everything was going smooth, and I moved over to Benchmarking. I usually benchmark with Aida64, Cinebench R23, and Heaven. I ran Aida64 for about 20 mins, and the system temps were stable between low to mid 70s. I continued with CBR23, and I was able to do two successful multicore passes with great temps for a CPU like this (again low to mid 70s). Moved on to Heaven for both GPUs, and that test ran without any issues.
At this point, I was confident that the system was stable as it was running all stock out-of-the-box settings and delivered the system to my friend.
After about three weeks, he contacted me about two specific issues he was having:
-
MOV playbacks in Adobe Premiere were shutting down the PC completely.
-
CBR23 was shutting down the system after a few minutes into the test, and CBR24 was shutting down the system automatically as soon as you started the test ( I asked him to run these benchmarks to test the stability of the system).
I now have the system and started to troubleshoot each component. This is what I’ve tested so far:
-
Check and benchmark both GPUS – Both GPUS are fine, as far as I can tell. Also, interestingly enough, the MOV crashes seem to have been resolved. I say that because I ran MOVs in Adobe Premier with a different GPU, and they played with no issues. Still need to test with one 4090 to be sure.
-
Crystal Benchmark for NVMEs – Both scored pretty close to advertised scores, and Temps were normal.
-
Memtest on R-DIMM Ram – All four DDR5 R-Dimms all checked out after a 12 hour test with no errors.
-
Test PSU – I connected the Be Quiet Dark Power Pro 13 to my PSU tester, and all of the voltage values were normal. The tester did not detect anything wrong.
-
Motherboard Swap ( Ordered a new Gigabyte TRX50 Aero D STR5 Mobo) – I transferred all of the components to the new motherboard and the system shutdown at the 2 min mark of Cinebench R23. I tried to run R24 but the system crashed automatically. Kernel Power Event ID 41. As an FYI, I tested this outside of the case.
At this point, I am really baffled at what could be causing this Kernel Power Event ID 41. The CPU is running at stock settings, and its behavior shouldn’t be like this at all. The system Temps look fine in HWMonitor and Ryzen Master when benching. Now, I know my friend won’t be running workloads as strenuous as Cinebench, but the system should be stable at the very least, especially without any OC or EXPO applied.
I think there might be an issue with the CPU, which is possible, but it’s fairly uncommon to see bad CPUS from the factory. My apologies for the long post, but any feedback or guidance would be greatly appreciated! I feel like I am losing my mind over here, LOL! Been building and troubleshooting computers for the past 14 years, and I never encountered this type of scenario. Thanks!