Should I RMA my 3900x?

I had to RMA a dead 3900x and now the replacement is flaky ( L3 ECC machine check exceptions and random reboots) after about a month.

I purchased a cheap gigabyte aorus x570 pro wifi on ebay to troubleshoot with and low and behold I can get it to be stable by either setting a fixed multiplier of around x35, disabling global C states OR setting Power Supply Idle Control to typical.

That last power supply setting I don’t think I have on my other MB, an ASUS x570 prime pro, but it seems to do the trick. It has the side effect of disabling C6, but here’s the wierd thing: if I re-enable C6 after boot using zenstates.py the system is still 100% stable, at least for a couple of days now. I confirmed the C6 states were active by monitoring with ryzen_smu and ryzen_monitor I found on github. This is in linux, BTW.

So its stable now but WWL1TD ? Don’t trust the CPU and RMA it again?

And then there’s the question of what the idle current setting is actually doing. According to the manual, its simply disabling the package level C6 state. But if I re-enable C6 with zenstates.py it says both package and core C6 are enabled. There must be something else going on with this idle current setting besides C6.

Are you running overclocked memory? or outside of spec for the memory configuration?

No overclocking. Tried default profile and XMP #1. Tried one stick at a time in different slots. 24hr memtest86 is clean. Don’t think its the memory. Its mushkin redline 64gb, cl16, 3200Mhz.

Latest BIOS and chipset installed? A lot of finicky issues are fixed by keeping those up to date.

I would RMA it and keep RMAing it if it’s not 100% stable at defaults (no DOCP/XMP and no PBO with standard sleep/idle settings). If you’re not 100% happy with it, send it back… The last thing you need is a platform you can’t trust… Also, not all new BIOS versions were as stable as previous ones. For the Asus Zenith II Extreme Alpha, version 1502 is better than 1603.

I just had a 3990x die – so yes, these CPUs do fail even with internal protections. Nothing in that workstation ever exceeded 56C (IceGiant Prosipon with a total of 23 fans and directed force fed cold air ducts in the case). I had a setup similar to Wendel’s Zenith II Extreme Alpha with 256GB of G.Skill Trident RAM as shown in his YouTube video. My best guess is that the last Patch Tuesday Microsoft Update screwed up a BIOS update and put the old settings back in the wrong places in the new BIOS. Probably bumped a voltage and fried everything – CPU, Motherboard and one of the SSDs. This update also screwed up my Radeon drivers on another platform (updated the GPU firmware and put the old settings in the wrong places) and yet another platform died as well… Three PC failures at once on the same day? No – not a coincidence… I’ve since killed all automatic updates on my platforms and will install just security updates myself from now on…

1 Like

Check Vsoc/VDDP/VDDGs with Ryzen Master (in Windows). Also check LLC, perhaps try with max correction. If run out of ideas, I would try different AGESA versions.

AGESA in general has been a mess! AMD’s partners have not been helping much either, some better than others doing BIOS.

In case, you’ve RMA’ed again already. Pls let us now if a new sample of 3900x solves your issue.

good suggestions. I did try most of those already. Tried various AGESA versions, discrete voltage boosts, and VDROOP setting. Nothing helps except disabling C6. Its just a marginal part and it is stable if I prevent voltage from dropping too low, I guess. I told support I’d reopen a ticket if I got more L3 cache ECC errors. Well I have. :frowning: So I’m going to RMA it.

1 Like

Good advice. I’ve been getting 1 or 2 L3 ECC errors per week so its going back to AMD. At least they give you full retail box incl. a wraith prism that I can hock on ebay, LOL.

1 Like

yes, various BIOS w/ various AGESA versions including the “stable” one a few versions back that folks generally recommended. Chipset drivers dont apply as its linux but I did try a few different kernel versions, including one compiled from mainline, to be sure.

2 Likes

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.