Motherboard killing RAM? Asking for help

Hello, first time posting here, and just wanted to hear the opinion from people more knowledgeable than me.

Made a computer with a Ryzen 5800X
32GB of RAM GSKILL F4-3600C16D-32GTZNC
Motherboard B550I AORUS PRO AX 1.0
Nvidia RTX3070
Case NZXT H1

After a few years the system started to have stability troubles, since it started shortly after moving and the electricity in the new place is a crap I thought it was a mix between the SFF powersupply and the quality of electricity so since I didn’t needed a SFF computer anymore I moved to a bigger case a bigger a better power supply (a bequiet) and also got a UPS.

Still had inestability troubles and run a memcheck (that I didn’t before) and just in one second the RAM start to spit tons of errors. Checked the RAM in another ryzen system and also ton of errors. I did a RMA of the RAM.

New RAM arrives I do a memtest for 24hours in each system I have (a total of 3 motherboards with different CPUs), everything looks fine.

Year and half and the same system starts again having troubles, I run a memtest and again having failing.

So my options are:
1- I have bad look and had multiple modules of RAM dying on me the first time in my life.
2- The motherboard is killing the RAM
3- The CPU is killing the RAM

What do you think it could be?

Sorry for the long post but I wanted to explain everything up to detail and sorry for my writing, English in not my main language.

Which specific bequiet PSU do you have and how many watts?

ITX mobo usually have good capacitors, especially from Gigabytes’.

Consider running your RAM in non-overclocked/standard JEDEC speeds and rerunning your RAM tests.

Maybe one of your RAM sticks really is faulty?

Power supply: Dark Power 13 850 W
The UPS is a BR1500G-GR

Running always at XMP speeds.

The RAM sticks where checked individually both XMP and JEDEC speeds in that motherboard, in a Gigabyte Aorus x570 Elite and in a AsrockRack X570D4U-2L2T/BCM

The initial 2 RAM sticks that I sent to RMA were only checked at JEDEC speeds, both gave errors, the new ones where checked when arrived at JEDEC and XMP when they arrived and were fine and now both are faulting in the 3 motherboards so there is definitively something that it’s damaging those or I had the worst of lucks.

What speed are you running it at? Does it fail with XMP off? In my shop Ivhave seen some memory degradation. I’ve had probably half a dozen or so I am4 machines come in for service this year. From DIY home builds to boutique pre builts, that all failed memtest at XMP after a while. They are usually fine with XMP off, so the customers usually just pay for the bench time, have me disable XMP and keep going. Even within warranty periods, XMP speeds are not guaranteed and if they pass with XMP off, many mfgs consider the kit good and will refuse a return.

posted new info at the same time you posted

Yeah I reread that, but as I said I’ve actually seen quite a bit of the memory kits being stable at XMP when you first get them and after about a year or so they fail. Usually though they’ll still work fine at stock speeds. Could have something killing them you could just have bad luck with ram. There’s always a bit of silicone lottery too. The same kit that’s in my system now is a 3200. My previous CPU I could only manually tune 3000 and be memtest stable, it’s 3600 and been stable for almost two years on my 5800X3D.

But if they are passing XMP initially, either you’re getting cheap kits that don’t really support that speed long term, or a voltage setting or something is slightly off on your motherboard that is slowly degrsding them. I’d make sure your bios is updated and then and manually tune your next kit. Also depending on how old your motherboard is some of the newer b550 despite being a lower teir chipset technically, now include 2.5g nics and added in faster USB, stuff older versions did not. So It might be worth swapping out.

UEFI updated, the motherboard has already 2.5gbps. What I don’t know if I should ditch the motherboard as a bad one or also could be the CPU

Before replacing those, try reseating the CPU a few times.

Then I’d switch out the PSU, which is easy for me because I have a spare. You might not though.

The modules where checked in 3 different systems

Since you suspect that one system is killing your RAM I would check if the all the voltages stay in the ranges they are supposed to be. I had systems that showed the recommended voltages under idle conditions then rose suddenly under some loads. All it needed was for me to set the voltages manually in the BIOS and the systems ran stable again.

1 Like

FWIW I had 12 of 19 5800x systems in a single environment start producing unacceptable amounts of memory errors. However it’s been a mix of bad memory and bad CPUs for me; this is significantly higher than “normal” error rates, perhaps by an order of magnitude.

I was never able to pin the exact cause down but my suspicion is that it is a mix of different motherboards defaulting to OC features being on, having poor transient voltage regulation and AMD’s weak/delicate IMC degrading. This episode ended up making a specific customer adopt a rule of no AMD processors without ECC memory. --not sure I agree with that because after Zen 3 AMD’s platform seemed to get more reliable.

One more thing, consider updating the BIOS of your motherboard. Sometimes the patches have fixes for stability issues. I dont know if you have auto update on firmware as well so there might have been be a bad update down the line.

Haven’t checked that, one would expect that configurations keep inside their values, but yes, looking at how many times manufacturers have been found overvolting things behind the hood I should be more wary about that

Same CPU I’m not happy reading it but gives me some mind sanity knowing it’s not something that no one has ever seen

Didn’t knew that was even possible but no, no autoupdates, last update was like half a year ago that for a B550 board that has been years in the market I thing it’s ok, not much chages/releases for boards that have been in the market for years and in a platform that has almost no changes (except sometimes a release of a new CPU with minor updates)

That is the top end of what most Ryzen 5x were capable of overclocking the memory to. Doesnt matter if you have “always run XMP speeds”, it is still an overclock. Running for years with an overclock can lead to degraded hardware, so maybe you are simply going to have to start clocking down more to lower the strain on both the memory controller (the thing likely degrading) and the memory, and probably adding even more voltage to bring stability back. Which would make things stable but has already started a loop of needing more voltage to run but that degrades more so it needs more voltage so it degrades faster.

I used to do some serious overclocking back in the socket 939 and LGA775 days and I would pushing things like the memory and bus well past what was the average and even get things stable at those high speeds, for a while. But changing some settings to hammer the hardware with that much current and voltage would cause the memory chips to degrade within 6-12 months and I got tired of replacing those DDR-600 sticks and CPUs. Even killed a record setting Opteron 165 that way. That was one of those golden sample ones that I was able to OC from 1.8GHz to 3.65GHz on air. For the past 15 years or so I have toned it down a lot more on the overclocking, just doing basic stuff, never the top end of what the average is so I didnt have problems with hardware stability and that has been a fine rule to follow for me as I havent had anymore killed hardware since.