Ryzen crashing while idle

Yeah, if it’s not a systemd service it won’t get root access proper if it’s a standard bash script launched directly. Zenstates needs root. Applying at startup needs elevated privledges. A systemd service helps that.

Yes right put it in the one where quiet is as well. With that set rerun the update-grub command and reboot.
I don’t think that you need zenstates then

Try playing with vddg and vddp voltages. 950/900 works well here. These kinds of crashes happen as well with FCLK and memory ocing.

Audio distortion and errors in event viewer are signs the problem lies here.

@FurryJackman Just crashed. Here are the error logs for sudo journalctl |grep -i "hardware err" and sudo journalctl -p 3 -xb: > Sep 19 23:32:08 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged

Sep 19 23:32:08 Compy-3700X kernel: mce: [Hardware Error]: CPU 6: Machine Check: 0 Bank 5: bea0000000000108
Sep 19 23:32:08 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc08ddfb4 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Sep 19 23:32:08 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1600572722 SOCKET 0 APIC c microcode 8701021
Sep 20 17:41:50 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Sep 20 17:41:50 Compy-3700X kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 22: b2a000000002010b
Sep 20 17:41:50 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 SYND 4d000000 IPID 1813e17000
Sep 20 17:41:50 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1600638104 SOCKET 0 APIC 0 microcode 8701021
Sep 20 17:41:50 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Sep 20 17:41:50 Compy-3700X kernel: mce: [Hardware Error]: CPU 7: Machine Check: 0 Bank 5: bea0000000000108
Sep 20 17:41:50 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc0318c54 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Sep 20 17:41:50 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1600638104 SOCKET 0 APIC e microcode 8701021
Sep 20 20:33:09 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Sep 20 20:33:09 Compy-3700X kernel: mce: [Hardware Error]: CPU 9: Machine Check: 0 Bank 5: bea0000000000108
Sep 20 20:33:09 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc00920c0 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Sep 20 20:33:09 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1600648383 SOCKET 0 APIC 3 microcode 8701021
Sep 20 20:33:09 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Sep 20 20:33:09 Compy-3700X kernel: mce: [Hardware Error]: CPU 13: Machine Check: 0 Bank 5: bea0000000000108
Sep 20 20:33:09 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff9ce0c7c8 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Sep 20 20:33:09 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1600648383 SOCKET 0 APIC b microcode 8701021
Sep 20 20:43:00 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Sep 20 20:43:00 Compy-3700X kernel: mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 5: bea0000000000108
Sep 20 20:43:00 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc1232dce MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Sep 20 20:43:00 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1600648974 SOCKET 0 APIC 4 microcode 8701021
Sep 20 20:43:00 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Sep 20 20:43:00 Compy-3700X kernel: mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 5: bea0000000000108
Sep 20 20:43:00 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc09e5f98 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Sep 20 20:43:00 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1600648974 SOCKET 0 APIC 8 microcode 8701021
Oct 14 14:17:43 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 14 14:17:43 Compy-3700X kernel: mce: [Hardware Error]: CPU 9: Machine Check: 0 Bank 5: bea0000000000108
Oct 14 14:17:43 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc0c32a7e MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Oct 14 14:17:43 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1602699457 SOCKET 0 APIC 3 microcode 8701021
Oct 17 17:46:30 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 17 17:46:30 Compy-3700X kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: bea0000000000108
Oct 17 17:46:30 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1f80154b346a2 MISC d0130fff00000000 SYND 4d000000 IPID 500b000000000
Oct 17 17:46:30 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1602971185 SOCKET 0 APIC 0 microcode 8701021
Oct 17 17:46:30 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 17 17:46:30 Compy-3700X kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 5: bea0000000000108
Oct 17 17:46:30 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 7ffba42ac580 MISC d0130fff00000000 SYND 4d000000 IPID 500b000000000
Oct 17 17:46:30 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1602971185 SOCKET 0 APIC a microcode 8701021
Oct 21 03:21:30 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 21 03:21:30 Compy-3700X kernel: mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 5: bea0000000000108
Oct 21 03:21:30 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffbbe2e50c MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Oct 21 03:21:30 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1603264884 SOCKET 0 APIC 4 microcode 8701021
Oct 21 03:21:30 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 21 03:21:30 Compy-3700X kernel: mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 5: bea0000000000108
Oct 21 03:21:30 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc0bdca7e MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Oct 21 03:21:30 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1603264884 SOCKET 0 APIC 8 microcode 8701021
Oct 22 05:33:16 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 22 05:33:16 Compy-3700X kernel: mce: [Hardware Error]: CPU 9: Machine Check: 0 Bank 5: bea0000000000108
Oct 22 05:33:16 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1f80250a301e4 MISC d0130fff00000000 SYND 4d000000 IPID 500b000000000
Oct 22 05:33:16 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1603359190 SOCKET 0 APIC 3 microcode 8701021
Oct 22 05:33:16 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 22 05:33:16 Compy-3700X kernel: mce: [Hardware Error]: CPU 15: Machine Check: 0 Bank 5: bea0000000000108
Oct 22 05:33:16 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 77a496c4 MISC d0130fff00000000 SYND 4d000000 IPID 500b000000000
Oct 22 05:33:16 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1603359190 SOCKET 0 APIC f microcode 8701021
Oct 24 10:17:14 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 24 10:17:14 Compy-3700X kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: bea0000000000108
Oct 24 10:17:14 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffb2e2e50c MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Oct 24 10:17:14 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1603549028 SOCKET 0 APIC 0 microcode 8701021
Oct 24 10:17:14 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 24 10:17:14 Compy-3700X kernel: mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 5: bea0000000000108
Oct 24 10:17:14 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc19a2a7e MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Oct 24 10:17:14 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1603549028 SOCKET 0 APIC a microcode 8701021
Oct 27 12:17:56 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 27 12:17:56 Compy-3700X kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 5: b6a0000000000108
Oct 27 12:17:56 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 7fb806a7b804 SYND 4d000000 IPID 500b000000000
Oct 27 12:17:56 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1603815470 SOCKET 0 APIC 0 microcode 8701021
Oct 27 12:17:56 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Oct 27 12:17:56 Compy-3700X kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 22: b2a000000002010b
Oct 27 12:17:56 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 SYND 4d000000 IPID 1813e17000
Oct 27 12:17:56 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1603815470 SOCKET 0 APIC 0 microcode 8701021
Nov 11 07:21:44 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Nov 11 07:21:44 Compy-3700X kernel: mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 5: bea0000000000108
Nov 11 07:21:44 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc0befa62 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Nov 11 07:21:44 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1605097297 SOCKET 0 APIC 4 microcode 8701021
Nov 11 07:21:44 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Nov 11 07:21:44 Compy-3700X kernel: mce: [Hardware Error]: CPU 6: Machine Check: 0 Bank 5: bea0000000000108
Nov 11 07:21:44 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc0bef500 MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Nov 11 07:21:44 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1605097297 SOCKET 0 APIC c microcode 8701021
Nov 22 15:59:16 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Nov 22 15:59:16 Compy-3700X kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 5: bea0000000000108
Nov 22 15:59:16 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc0fafebe MISC d0130fff00000000 SYND 4d000000 IPID 500b000000000
Nov 22 15:59:16 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1606078750 SOCKET 0 APIC 6 microcode 8701021
Nov 22 15:59:16 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Nov 22 15:59:16 Compy-3700X kernel: mce: [Hardware Error]: CPU 7: Machine Check: 0 Bank 5: bea0000000000108
Nov 22 15:59:16 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffb427157a MISC d0130fff00000000 SYND 4d000000 IPID 500b000000000
Nov 22 15:59:16 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1606078750 SOCKET 0 APIC e microcode 8701021
Nov 23 18:09:57 Compy-3700X kernel: mce: [Hardware Error]: Machine check events logged
Nov 23 18:09:57 Compy-3700X kernel: mce: [Hardware Error]: CPU 3: Machine Check: 0 Bank 5: bea0000000000108
Nov 23 18:09:57 Compy-3700X kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffffc08eca7e MISC d012000100000000 SYND 4d000000 IPID 500b000000000
Nov 23 18:09:57 Compy-3700X kernel: mce: [Hardware Error]: PROCESSOR 2:870f10 TIME 1606172991 SOCKET 0 APIC 6 microcode 8701021

– Logs begin at Sat 2020-09-19 22:29:04 EDT, end at Mon 2020-11-23 19:59:03 EST. –
Nov 23 19:53:55 Compy-3700X kernel: nvme 0000:01:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0011 address=0xfedfc000 flags=0x0050]
Nov 23 19:53:56 Compy-3700X kernel: msr: Write to unrecognized MSR 0xc0010292 by zenstates
Please report to [email protected]
Nov 23 19:53:56 Compy-3700X kernel: msr: Write to unrecognized MSR 0xc0010292 by zenstates
Please report to [email protected]
Nov 23 19:53:56 Compy-3700X kernel: msr: Write to unrecognized MSR 0xc0010292 by zenstates
Please report to [email protected]
Nov 23 19:53:56 Compy-3700X kernel: msr: Write to unrecognized MSR 0xc0010292 by zenstates
Please report to [email protected]
Nov 23 19:53:56 Compy-3700X kernel: msr: Write to unrecognized MSR 0xc0010292 by zenstates
Please report to x86@kernel
Nov 23 19:53:56 Compy-3700X kernel: msr: Write to unrecognized MSR 0xc0010292 by zenstates
Please report to x86@kernel
Nov 23 19:53:56 Compy-3700X kernel: msr: Write to unrecognized MSR 0xc0010292 by zenstates
Please report to x86@kernel
Nov 23 19:53:56 Compy-3700X kernel: msr: Write to unrecognized MSR 0xc0010292 by zenstates
Please report to x86@kernel
Nov 23 19:53:56 Compy-3700X kernel: msr: Write to unrecognized MSR 0xc0010292 by zenstates
Please report to x86@kernel
Nov 23 19:53:56 Compy-3700X kernel: msr: Write to unrecognized MSR 0xc0010292 by zenstates
Please report to x86@kernel
Nov 23 19:54:35 Compy-3700X pulseaudio[1771]: GetManagedObjects() failed: org.freedesktop.DBus.Error.TimedOut: Failed to activate service ‘org.bluez’: timed out (service_start_timeout=25000ms)

Edited to comply with link limit.

@modzilla Ok, I’ll try that and see if it makes a difference. Thank you.

Here is my grub config. Let me know if anything seems out of the ordinary.

GRUB_DEFAULT=saved
GRUB_TIMEOUT=10
GRUB_TIMEOUT_STYLE=menu
GRUB_DISTRIBUTOR="Manjaro"
GRUB_CMDLINE_LINUX_DEFAULT="quiet apparmor=1 security=apparmor resume=UUID=c8e8be7f-12f8-4260-b15b-8ee0d6203cf8 udev.log_priority=3"
GRUB_CMDLINE_LINUX="processor.max_cstate=5 rcu_nocbs=0-11 quiet splash"

# If you want to enable the save default function, uncomment the following
# line, and set GRUB_DEFAULT to saved.
GRUB_SAVEDEFAULT=true

# Preload both GPT and MBR modules so that they are not missed
GRUB_PRELOAD_MODULES="part_gpt part_msdos"

# Uncomment to enable booting from LUKS encrypted devices
#GRUB_ENABLE_CRYPTODISK=y

# Uncomment to use basic console
GRUB_TERMINAL_INPUT=console

# Uncomment to disable graphical terminal
#GRUB_TERMINAL_OUTPUT=console

# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command 'videoinfo'
GRUB_GFXMODE=auto

# Uncomment to allow the kernel use the same resolution used by grub
GRUB_GFXPAYLOAD_LINUX=keep

# Uncomment if you want GRUB to pass to the Linux kernel the old parameter
# format "root=/dev/xxx" instead of "root=/dev/disk/by-uuid/xxx"
#GRUB_DISABLE_LINUX_UUID=true

# Uncomment to disable generation of recovery mode menu entries
GRUB_DISABLE_RECOVERY=true

# Uncomment and set to the desired menu colors.  Used by normal and wallpaper
# modes only.  Entries specified as foreground/background.
GRUB_COLOR_NORMAL="light-gray/black"
GRUB_COLOR_HIGHLIGHT="green/black"

# Uncomment one of them for the gfx desired, a image background or a gfxtheme
#GRUB_BACKGROUND="/usr/share/grub/background.png"
GRUB_THEME="/usr/share/grub/themes/manjaro/theme.txt"

# Uncomment to get a beep at GRUB start
#GRUB_INIT_TUNE="480 440 1"

That is correct, did you run update-grub?

Something seems very wrong with your power delivery then. It might be time to RMA your motherboard.

I did. It hasn’t crashed yet, but it hasn’t been very long.

@FurryJackman Will MSI accept an RMA request if I don’t have the socket cover?

I see TSC in the kernel messages, id assume your TSC clock source is unstable and the issues arise around your frequency changing stepping’s.

While i see others have posted messages regarding limiting cstates, you could just use make menuconfig and compile the kernel without any power management support for the CPU.

This should cause the CPU to run at a solid clock speed, make sure to turn off anything that manipulated frequency as well in the bios.

Also latency is a bit better you have all the frequency crap disabled anyway.

Guys that have duel socket systems should know what iam talking about, latency is a big problem for us in the KVM environments.

When trying to use the TSC clock source on duel socket systems its almost a must to disable all the power management stuff anyway.

Best Regards,

@Gandhi Did you get anywhere?
I’m having similar issues with my 5900X on a Asus Crosshair VIII Hero. XMP turned off, no PBO.

The system freezes, becomes unreactive and then reboots. Sometimes the sound also “freezes”. It seems like it happens when starting a program / script etc or when something is stopping. It can happen after 5min of running or after ~2days.

I tried the PSU idle voltage and its still crashing.
I havent tried the cpu core voltage or the CPU NB/SoC Voltage offset yet.

Besides the mce hardware error im also getting:

>     Nov 26 14:02:33 mymachine kernel: __common_interrupt: 10.55 No irq handler for vector
> 
>     Nov 26 14:02:33 mymachine kernel: sp5100-tco sp5100-tco: Watchdog hardware is disabled
> 
>     Nov 26 14:02:34 mymachine kernel: EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)

Additional things I’ve tried:

  • Bios versions: 2402, 2502, 2702
  • Disable XMP and PBO (XMP with these sticks was no problem on a 7700K & Asus Strix 270E Gaming)
  • latest linux-lts kernel (5.4)
  • Different PSU (Be Quiet! Dark Power 12 1500Watt)

Whats kinda weird is that the benchmarks seem to run stable:

  • “stress-ng --cpu 6 --vm 6 --verify 1 --vm-bytes 80%” ran without issues for 20min
  • “phoronix-test-suit” completes: Appleseed, browser suit, x264, x265, Embree, CppPerformanceBenchmarks
  • “memtest86+” completed an iteration without errors - didnt have the time to run more yet

I was thinking of trying the following:

  • Try windows
  • Try kernel 5.10
  • Try a different GPU (but only have a super old Radeon and a 1080TI)
  • Try different RAM sticks (have a pair of 8GB)
  • Try a CPU NB/SoC Voltage offset
  • try setting the core voltage to normal
  • Try to do something about the C states? But zenstates only works for Zen 2 so far.

Sadly, no. I’ve tried on Windows 10 an Linux kernel 5.10, but I still get crashes on both. I’ve also tried new RAM with no success. Tried a voltage offset, no luck. Disabled the c-states in both the BIOS and using Zenstates.py, again no luck. Haven’t tried a new GPU though, and I don’t think I’ve tried setting the core voltage to normal. However, at this point I’m beginning to think there is something very wrong with either the CPU or the motherboard. Fortunately, I have a friend with a Ryzen 5 3600 that I can trade with for a bit a see if the problem persists. Whichever it turns out to be I’ll probably have to RMA. I’ll let you guys know how it goes.

1 Like

Ok. I also tried Windows 10 now :stuck_out_tongue:
Also crashes…
Trying less ram now. And then will try the 2nd set of RAM sticks…
And then i’ll RMA it. Its a bundle anyways.

How finicky is Ryzen with RAM?
I.e. Should any set of RAM work on 2133MHz, even if its not on the list of supported RAM?

Yeah 2133 shouldn’t be a problem at all, especially not for a 5k series and since you already ran memtest the problem is most likely going to be either the mainboard or the cpu.

I’m having same issue

EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)

joulu 06 03:27:04 TreeOfLight kernel: [Hardware Error]: Corrected error, no action required.
joulu 06 03:27:04 TreeOfLight kernel: [Hardware Error]: CPU:1 (19:21:0) MC2_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0x9c20400004020136
joulu 06 03:27:04 TreeOfLight kernel: [Hardware Error]: Error Addr: 0x0000000406c70650
joulu 06 03:27:04 TreeOfLight kernel: [Hardware Error]: IPID: 0x000200b000000000, Syndrome: 0x000111081a44352c
joulu 06 03:27:04 TreeOfLight kernel: [Hardware Error]: L2 Cache Ext. Error Code: 2, L2M Data Array ECC Error.
joulu 06 03:27:04 TreeOfLight kernel: [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD

[    0.028616] Booting paravirtualized kernel on bare hardware
[    0.528204] mce: [Hardware Error]: Machine check events logged
[    0.528205] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 2: bea0200004020152
[    0.528207] mce: [Hardware Error]: TSC 0 ADDR 2ddea9c50 MISC d012000100000000 SYND 167101d442129 IPID 200b000000000 
[    0.528209] mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1607216150 SOCKET 0 APIC 2 microcode a201009
[    3.818545] systemd[1]: Condition check resulted in Rebuild Hardware Database being skipped.
[ 1873.059558] mce: [Hardware Error]: Machine check events logged
[ 1873.059561] [Hardware Error]: Corrected error, no action required.
[ 1873.059565] [Hardware Error]: CPU:1 (19:21:0) MC2_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0x9c20400004020136
[ 1873.059569] [Hardware Error]: Error Addr: 0x0000000406c70650
[ 1873.059570] [Hardware Error]: IPID: 0x000200b000000000, Syndrome: 0x000111081a44352c
[ 1873.059572] [Hardware Error]: L2 Cache Ext. Error Code: 2, L2M Data Array ECC Error.
[ 1873.059574] [Hardware Error]: cache level: L2, tx: DATA, mem-tx: DRD

Ryzen 5600x Asus Rog Strix X570-E Bios “Version 2816 Beta Version”

Ubuntu 20.10
Linux TreeOfLight 5.8.0-31-generic #33-Ubuntu SMP Mon Nov 23 18:44:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Sensors don’t work :frowning: So I have no idea what my CPU voltage is, it seems to be around 1.34v in Bios/UeFI that doesn’t seem normal at all. when I push offset it says 1.1v normal but the bios voltage is still 1.34v

(dunno if related but getting USB slow errors 2)
usr/libexec/gdm-x-session[2483]: (EE) event2 - ASUS ROG GLADIUS: client bug: event

/usr/libexec/gdm-x-session[2483]: (EE) event6 - Logitech USB Keyboard: client bug: event processing lagging behind by 30ms, your system is too slow processing lagging behind by 11ms, your system is too slow

Okay, I’m seeing this a lot with a lot of the recent BIOSes. Downgrade your BIOS and turn off C-states until a proper fix is in place. It seems the hardware errors are related to deep sleep states and it affects USB 2.0 ports as well:

USB 2.0 weirdness and Hardware Errors are currently common with the recent batch of Beta BIOSes.

For Gigabyte, use B550 series F11J BIOS and for X570 use F31K.

@JoneK
I’m only running it with 2 sticks of RAM now and its gotten more stable…
But I got the same mce hardware errors as you once!
so wuhuu ?? :thinking:

I sent a ticket to AMD. No response yet.
I saw that the Hero VIII non-wifi got a new BIOS today…
Maybe that will make it better…

@FurryJackman
Does C-state disabling in the BIOS affect Linux or do you need ZenStates? AFAIK Zenstates unfortunately only works for Zen 2 so far.

Maybe I’ll try using only 3.0 ports…

Zenstates is supposed to work unless you need a new kernel to expose more MSRs.

On my 5900x i needed to set vddg and vddp as i mentioned above before it was stable. Not a single problem since changing that shortly after release.
Mobo is an Aorus X570 pro.

Are you saying that apart from changing vddg and vddp no other changes been made and system does not run into MCE errors and reboots?

I want to jump on the tread too. I’ve built a system 9 days ago and every single day I’m having a random reboot.

I’m running 5900X on MSI X570 Tomahawk with G.Skill 32 Gb 2x16 Kit at 3600 Mhz CL16.

My current summary is:

  • If system goes from Off state to On state (power on), it randomly reboots
  • If system goes from Sleep state to On state (wake from sleep), it randomly reboots, normally within 5 minutes
  • Once the reboot happen, it never happens again no matter the load or uptime since last reboot, unless ^

Playing around with settings in BIOS yielded no effect, I’ve been trying power settings, PBO disable, IOMMU etc.

As per one of the suggestion in a lengthy BZ discussion I’ve tried disabling SMT and for now it’s running stable, but didn’t have a lot of time today to play around.

So just a quick update from my side.
I had to set a tiny SoC voltage offset (I think 0.0025V). I also set the idle to “typical”.
Since then my computer is stable…
Don’t even need to disable C6 states.

I will test XMP and PBO in the next days.