Not long after I got my Ryzen 1700, I started having issues with machine check errors in Linux. Eventually I found a workaround; disabling the C6 power state. And I still do this as a matter of course due to overclocking to 3.8 GHz by default.
But I recently playing with the machine and stock settings, and sure enough it still threw a MCE soon after booting. This does not occur at all with Windows, or if I have C6 disabled in Linux.
Why is this still an issue with this CPU and Linux? As far as I know it’s a simple fix…
you can disable the uop cache in AMD CBS settings in your BIOS (UEFI settings). uop as in micro-op, as in mu-op (ÎĽop). It hardly reduces performance if you disable it, but it completely fixes this issue.
Despite the BIOS lockout, software overclocking is still quite possible. In Windows, Ryzen Master mostly works fine (though memory tuning is locked out, CPU speeds and voltage are fully controllable.) In a similar manner, ZenStates (a collection of python scripts that toggle various msr registers) can change clock speeds and disable C6 in Linux.
Oh totally forgot about that, having a old system these tend to go over my head since I always use the bios for overclocking. Weird question but does it also happen in windows these issues
This is a good recommendation but if you are unable to change settings in the BIOS you should run the tool to check if you are affected or check the manufacturing date encoded in the text on the heatspreader of the processor. If you are affected you might never be able to be stable since it is a hardware bug.
Nope, Windows is just peachy. That’s why I was asking why it was still a thing with Linux.
@H-i-v-e That’s the thing…if I disable C6, or boot up Windows, it’s almost perfectly stable. I seem to recall running that tool in the past, but got inconclusive results. I can try again of course…
What I have found useful is turning off some parts of the CPU particularly the uOP cache. I have no idea why that fixes the problem but I stopped getting MCEs entirely after that and c state disabling. But c states came first and had little effect. So when I went to disable the uOP cache in the asus uefi. Boom problems gone. No idea as to why that might be the case other than the math bug may be in the uOP cache? I know little back story on this other than the ordeal pissed me off because it happened on a linux machine doing some server ops