Unable to POST with DRAM voltage > 1.20v

Sea_Monkey · October 13, 2023, 2:59pm

I recently upgraded from a Threadripper 3960X to a Threadripper 3990X, and my system wouldn’t post with the DOCP profile that was perfectly stable on my 3960X. After a while of playing with manual RAM settings, I realized that regardless of frequency or RAM timings, if DRAM voltage was set at all above 1.20v, the system would fail to post. Why would a new processor cause this, and is there anything I can do about it?

KleerKut · October 14, 2023, 11:38pm

If the RAM, Motherboard, and old CPU worked fine, and the only hardware change is the CPU, I’d be inclined to believe it is the CPU or more specifically the contact points. Try reseating it and check the pads for any discoloration.

Sea_Monkey · October 15, 2023, 7:48pm

Checked the pads and socket, reseated the CPU, and reseated the RAM. Same result. I’m currently running MemTest+ at 3200MHz with auto timings to verify stability at 1.2v with the much-looser-than-XMP timings.

It’s such a curious problem. I don’t see how bumping the voltage from 1.20v to 1.21v could actually make it unstable, which makes me inclined to believe it’s a BIOS bug that only affects the 3990X. I’m far from an expert on these matters though, so…

edit Also, I realized I never mentioned the board. It’s an ASUS Prime TRX40-Pro.

Sea_Monkey · October 25, 2023, 3:58am

Was running stable with RAM downclocked to 3166 (I think) until today when the system shutoff and wouldn’t POST after that. Tried putting my 3960X back in and it won’t POST either. Fuck.

gearhead · October 25, 2023, 8:16pm

Watch the SOC voltages on the 3990x… If you’re using an Asus MB, they will auto-boost the SOC voltages to 1.45V+ – a lot higher than AMD’s recommended max of 1.15V (the SOC voltages feed directly into the CPU’s central IO die/memory controller). I’ve had Asus MB’s fry 6 (six) expensive 3990x CPUs. I’ve found that a fried CPU can also take out the MB too. Neither will ever POST again. Asus took a shortcut with testing compatible DIMMs by just boosting voltages. While this works for quick benchmarks, I found that sustained high memory use will fry the CPU (heavy PostgreSQL extractions which use the bulk of the 3990x CPUs for hours). I recommnend using only JEDEC clocks/voltages with a 3990x, especially on ASUS MBs. The crux of the biscuit with AMD’s chiplets is that, while easier to scale, the cost is that all data must be serialized and then deserialized to move it across the dual-torus data loop AMD calls their Infinity Fabric. While around half of this load can be distributed amongst the many chiplets, the other half is all concentrated in the central IO die. What compounds the problem is that their central IO dies use the previous FAB process – which means larger trace sizes and more heat. Silicon just DOES NOT want to run over 3 GHz. You run into the hockey stick part of the curve with respect to clocks and heat. This is why server gear almost always limits their clocks to 3 GHz or under. Even when you can keep the silicon cool, you’ll still get ion diffusion across the transistor gaps and increased leakage into the substrate with higher clocks… All manufacturers care about is clearing their warranty date. They’ll run stuff faster for better benchmark press numbers but fry the chips as a result. I’ve got PC gear over 10 years old (all running at 2.8 GHz or under) which still run great. The new "overclocked: stuff? They get unstable after a few years unless you downclock them.

Just my 2 cents…