Here is my grub config. Let me know if anything seems out of the ordinary.
GRUB_DEFAULT=saved
GRUB_TIMEOUT=10
GRUB_TIMEOUT_STYLE=menu
GRUB_DISTRIBUTOR="Manjaro"
GRUB_CMDLINE_LINUX_DEFAULT="quiet apparmor=1 security=apparmor resume=UUID=c8e8be7f-12f8-4260-b15b-8ee0d6203cf8 udev.log_priority=3"
GRUB_CMDLINE_LINUX="processor.max_cstate=5 rcu_nocbs=0-11 quiet splash"
# If you want to enable the save default function, uncomment the following
# line, and set GRUB_DEFAULT to saved.
GRUB_SAVEDEFAULT=true
# Preload both GPT and MBR modules so that they are not missed
GRUB_PRELOAD_MODULES="part_gpt part_msdos"
# Uncomment to enable booting from LUKS encrypted devices
#GRUB_ENABLE_CRYPTODISK=y
# Uncomment to use basic console
GRUB_TERMINAL_INPUT=console
# Uncomment to disable graphical terminal
#GRUB_TERMINAL_OUTPUT=console
# The resolution used on graphical terminal
# note that you can use only modes which your graphic card supports via VBE
# you can see them in real GRUB with the command 'videoinfo'
GRUB_GFXMODE=auto
# Uncomment to allow the kernel use the same resolution used by grub
GRUB_GFXPAYLOAD_LINUX=keep
# Uncomment if you want GRUB to pass to the Linux kernel the old parameter
# format "root=/dev/xxx" instead of "root=/dev/disk/by-uuid/xxx"
#GRUB_DISABLE_LINUX_UUID=true
# Uncomment to disable generation of recovery mode menu entries
GRUB_DISABLE_RECOVERY=true
# Uncomment and set to the desired menu colors. Used by normal and wallpaper
# modes only. Entries specified as foreground/background.
GRUB_COLOR_NORMAL="light-gray/black"
GRUB_COLOR_HIGHLIGHT="green/black"
# Uncomment one of them for the gfx desired, a image background or a gfxtheme
#GRUB_BACKGROUND="/usr/share/grub/background.png"
GRUB_THEME="/usr/share/grub/themes/manjaro/theme.txt"
# Uncomment to get a beep at GRUB start
#GRUB_INIT_TUNE="480 440 1"
I see TSC in the kernel messages, id assume your TSC clock source is unstable and the issues arise around your frequency changing stepping’s.
While i see others have posted messages regarding limiting cstates, you could just use make menuconfig and compile the kernel without any power management support for the CPU.
This should cause the CPU to run at a solid clock speed, make sure to turn off anything that manipulated frequency as well in the bios.
Also latency is a bit better you have all the frequency crap disabled anyway.
Guys that have duel socket systems should know what iam talking about, latency is a big problem for us in the KVM environments.
When trying to use the TSC clock source on duel socket systems its almost a must to disable all the power management stuff anyway.
@Gandhi Did you get anywhere?
I’m having similar issues with my 5900X on a Asus Crosshair VIII Hero. XMP turned off, no PBO.
The system freezes, becomes unreactive and then reboots. Sometimes the sound also “freezes”. It seems like it happens when starting a program / script etc or when something is stopping. It can happen after 5min of running or after ~2days.
I tried the PSU idle voltage and its still crashing.
I havent tried the cpu core voltage or the CPU NB/SoC Voltage offset yet.
Besides the mce hardware error im also getting:
> Nov 26 14:02:33 mymachine kernel: __common_interrupt: 10.55 No irq handler for vector
>
> Nov 26 14:02:33 mymachine kernel: sp5100-tco sp5100-tco: Watchdog hardware is disabled
>
> Nov 26 14:02:34 mymachine kernel: EDAC amd64: Error: F0 not found, device 0x1650 (broken BIOS?)
Additional things I’ve tried:
Bios versions: 2402, 2502, 2702
Disable XMP and PBO (XMP with these sticks was no problem on a 7700K & Asus Strix 270E Gaming)
latest linux-lts kernel (5.4)
Different PSU (Be Quiet! Dark Power 12 1500Watt)
Whats kinda weird is that the benchmarks seem to run stable:
“stress-ng --cpu 6 --vm 6 --verify 1 --vm-bytes 80%” ran without issues for 20min
Sadly, no. I’ve tried on Windows 10 an Linux kernel 5.10, but I still get crashes on both. I’ve also tried new RAM with no success. Tried a voltage offset, no luck. Disabled the c-states in both the BIOS and using Zenstates.py, again no luck. Haven’t tried a new GPU though, and I don’t think I’ve tried setting the core voltage to normal. However, at this point I’m beginning to think there is something very wrong with either the CPU or the motherboard. Fortunately, I have a friend with a Ryzen 5 3600 that I can trade with for a bit a see if the problem persists. Whichever it turns out to be I’ll probably have to RMA. I’ll let you guys know how it goes.
Ok. I also tried Windows 10 now
Also crashes…
Trying less ram now. And then will try the 2nd set of RAM sticks…
And then i’ll RMA it. Its a bundle anyways.
How finicky is Ryzen with RAM?
I.e. Should any set of RAM work on 2133MHz, even if its not on the list of supported RAM?
Yeah 2133 shouldn’t be a problem at all, especially not for a 5k series and since you already ran memtest the problem is most likely going to be either the mainboard or the cpu.
Ubuntu 20.10
Linux TreeOfLight 5.8.0-31-generic #33-Ubuntu SMP Mon Nov 23 18:44:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Sensors don’t work So I have no idea what my CPU voltage is, it seems to be around 1.34v in Bios/UeFI that doesn’t seem normal at all. when I push offset it says 1.1v normal but the bios voltage is still 1.34v
(dunno if related but getting USB slow errors 2)
usr/libexec/gdm-x-session[2483]: (EE) event2 - ASUS ROG GLADIUS: client bug: event
/usr/libexec/gdm-x-session[2483]: (EE) event6 - Logitech USB Keyboard: client bug: event processing lagging behind by 30ms, your system is too slow processing lagging behind by 11ms, your system is too slow
Okay, I’m seeing this a lot with a lot of the recent BIOSes. Downgrade your BIOS and turn off C-states until a proper fix is in place. It seems the hardware errors are related to deep sleep states and it affects USB 2.0 ports as well:
USB 2.0 weirdness and Hardware Errors are currently common with the recent batch of Beta BIOSes.
For Gigabyte, use B550 series F11J BIOS and for X570 use F31K.
On my 5900x i needed to set vddg and vddp as i mentioned above before it was stable. Not a single problem since changing that shortly after release.
Mobo is an Aorus X570 pro.
Are you saying that apart from changing vddg and vddp no other changes been made and system does not run into MCE errors and reboots?
I want to jump on the tread too. I’ve built a system 9 days ago and every single day I’m having a random reboot.
I’m running 5900X on MSI X570 Tomahawk with G.Skill 32 Gb 2x16 Kit at 3600 Mhz CL16.
My current summary is:
If system goes from Off state to On state (power on), it randomly reboots
If system goes from Sleep state to On state (wake from sleep), it randomly reboots, normally within 5 minutes
Once the reboot happen, it never happens again no matter the load or uptime since last reboot, unless ^
Playing around with settings in BIOS yielded no effect, I’ve been trying power settings, PBO disable, IOMMU etc.
As per one of the suggestion in a lengthy BZ discussion I’ve tried disabling SMT and for now it’s running stable, but didn’t have a lot of time today to play around.
So just a quick update from my side.
I had to set a tiny SoC voltage offset (I think 0.0025V). I also set the idle to “typical”.
Since then my computer is stable…
Don’t even need to disable C6 states.
Yep i adjusted those voltages(and ram but that is separate) and set ram to 3733 with tight timings. It’s rock solid ever since i did that, windows and linux. Sometimes it was left compiling for hours on 24 threads with no problems whatsoever.
I RMA’ed my 3700X and received a new just before Christmas. While things looked promising initially, the computer crashed again shortly after logging into Windows. I haven’t had any problems with Linux yet. MSI has released a beta bios update for my motherboard. I may try that in the meantime.
I do have to wonder though about my power supply. I’ve only had it for a few years, but it is an older model. It’s a Antec HCP 1000W Platinum. Would it being old matter? Also, would it matter if the CPU cable is plugged into the 12v1 slot as opposed to the 12v2, 12v3, or 12v4? I really know next to nothing about power supplies.