Duplicating my post from Reddit
TL:DR 4 different Ryzen (1700 and three 2700x) systems that I have access to sometimes freeze with rcu_sched detected stalls message. The worst part is that I can trigger the freeze by running Xubuntu 18.04 in a VirtualBox and freeze the host (Windows 10, for example) also. Since the issue exist in 4 different systems, I think it is unlikely that there is a problem with one individual component.
To trigger the freeze I use compiling curl in a loop in tmpfs, it usually happens in less then 2 hours. This is the only way I was able to trigger this freeze, otherwise system seems stable both in Windows and Xubuntu. I will be grateful if someone will try and test it also, either in VM or in native Xubuntu 18.04
Disabling SMT seems to help, but not exactly a solution. The strangest thing is, that updating gcc from 7 to 8 helps also, but I don’t think that the gcc is the cause of the problem, it gcc probably just manages to randomly trigger the bug; I believe so, because it should not matter what type of user applications you are running in VMs, crashes in them should not lead to total system freeze, including the host.
Things I’ve tried and that didn’t help:
Obviously, systems are not overclocked (but overclocking, unsurprisingly, didn’t help either)
Changing memory to another kit (anyway, memtest didn’t fail in 12 hours, so that’s something)
Setting memory to 2133MHz (kits are rated for 3000MHz)
Compiling latest kernel (4.20.x) , and adding various kernel boot parameters (idle=nomwait/idle=halt, processor.max_cstate=5, rcu_nocbs=0-15 with recompiled kernel that supports that option). Idle=halt helped a lot, but freezes still happen.
Using zenstates to disable C6
Increasing SoC voltage to 1.1v
Increasing CPU voltage by 0.0125v (don’t want to go any higher because XFR voltages for 2700x are high enough already)
Increasing DRAM voltage to 1.3v
Setting mysterious BIOS parameter to typical current idle
Disabling cores on CPU down to 2 (4 threads total)
Using another, high-end, PSU
Connecting a couple of mechanical HDDs to PSU, because there were reports, that some PSUs can’t handle low loads when the system idles
Setting cpu governor to performance
Ryzen 1700 was earlier RMAd because of segfault bug
The temperatures are fine, I’ve never seen more than 65 TDie on 2700x and 45 on non-overclocked 1700.
Ryzen 1700 + 2x8 Corsair 3000 MHz RAM + ASRock X370 Gaming K4 + GTX 1080 + Samsung 960 Evo + a couple of HDDs + Fractal Design Newton R3 800w 80+ Platinum PSU
Ryzen 2700x + 4x16 Corsair 3000MHz RAM + ASRock B450 Pro4 + GT1030 + Samsung 860 Evo + Aerocool KCAS 650w 80+ Gold PSU
2x(Ryzen 2700x + 4x16 Corsair 3000MHz RAM + ASUS X470 Pro + GT1030 + Samsung 860 Evo + Aerocool KCAS 650w 80+ Gold PSU)