Why am I still getting machine check errors with Zen 1?

Not long after I got my Ryzen 1700, I started having issues with machine check errors in Linux. Eventually I found a workaround; disabling the C6 power state. And I still do this as a matter of course due to overclocking to 3.8 GHz by default.

But I recently playing with the machine and stock settings, and sure enough it still threw a MCE soon after booting. This does not occur at all with Windows, or if I have C6 disabled in Linux.

Why is this still an issue with this CPU and Linux? As far as I know it’s a simple fix…

Wow, you sound just like Microsoft with Windows 11.

“Just toss it in a dumpster, problem solved.”

Note to self: Don’t mention that dual Westmere setup

That is not helpful at all. Are you going to say the same thing since I have a 8350?

Just help someone out instead of being a jerk!

2 Likes

Especially because you’re overclocked but even if you weren’t running a few stability tests is probably a good idea.

Core cycler is well suited to testing for issues with idle / low load and Ycruncher is good for testing memory and multithreaded stability.

http://www.numberworld.org/y-cruncher/

Maybe relevant? I know this was a huge issue for first gen Ryzens:

If this affects you, you might have missed your chance at an RMA, but supposedly there’s an alternative workaround:

you can disable the uop cache in AMD CBS settings in your BIOS (UEFI settings). uop as in micro-op, as in mu-op (ÎĽop). It hardly reduces performance if you disable it, but it completely fixes this issue.

1 Like

Thanks, this is a great clue. However my BIOS is locked down very tightly (OEM system.) I’ll see if there is a Dell equivalent of this setting.

EDIT: Could this be disabled with the wrmsr command? Even if it can, this still begs the question…why hasn’t this been fixed in the kernel yet?

If you don’t mind me asking how were you able to overclock if your bios is locked down?

Despite the BIOS lockout, software overclocking is still quite possible. In Windows, Ryzen Master mostly works fine (though memory tuning is locked out, CPU speeds and voltage are fully controllable.) In a similar manner, ZenStates (a collection of python scripts that toggle various msr registers) can change clock speeds and disable C6 in Linux.

See here: Overclock your Ryzen CPU from Linux

Disclaimer: This tool looks like it didn’t see much development past Zen+ and Zen 2. Anything newer than Zen+, I’d use it with extreme caution.

Oh totally forgot about that, having a old system these tend to go over my head since I always use the bios for overclocking. Weird question but does it also happen in windows these issues

This is a good recommendation but if you are unable to change settings in the BIOS you should run the tool to check if you are affected or check the manufacturing date encoded in the text on the heatspreader of the processor. If you are affected you might never be able to be stable since it is a hardware bug.

1 Like

Nope, Windows is just peachy. That’s why I was asking why it was still a thing with Linux.

@H-i-v-e That’s the thing…if I disable C6, or boot up Windows, it’s almost perfectly stable. I seem to recall running that tool in the past, but got inconclusive results. I can try again of course…

Would the CPU stepping tell me anything? Mine is ZP-B1, B2 seems to be the more current version.

yeah zen1 had the hw marginality bug so you might have one of those bugged CPUs

2 Likes

I have one of those

What I have found useful is turning off some parts of the CPU particularly the uOP cache. I have no idea why that fixes the problem but I stopped getting MCEs entirely after that and c state disabling. But c states came first and had little effect. So when I went to disable the uOP cache in the asus uefi. Boom problems gone. No idea as to why that might be the case other than the math bug may be in the uOP cache? I know little back story on this other than the ordeal pissed me off because it happened on a linux machine doing some server ops