Crashes - Centos 7 with AMD Ryzen 7 and Asus PRIME-B350M-A

Sure thing. Kernel dump logs are sometimes disabled for security reasons. I don’t know the default state for Centos. I should know … but alas I do not.

Just to update the ryzen-kill script passed. No error after 24hours.

I wouldn’t think AMD would send you another faulty CPU, but glad to hear it’s confirmed.

Did you manage to get a successful kernel dump?

The systemd logs are very worrying to me. Have you tried reinstalling your OS?

Which specific Ryzen 7 are you using, and are you overclocking it? These can lead to crashes if you don’t have enough voltage.

Do you have another Ryzen motherboard? I’m starting to suspect something may be wrong with the power delivery on this board.


To recap the things that have been tried:

  • checksum installer (known good)
  • memtest passed.
  • changed PSU
  • RMA CPU
  • Install kernel 4.14
  • installed bios 3402
  • tried LiveUSB, no sata

interesting conditionals:

  • SVM disabled allows 10 days uptime
  • 8 days under full prime95 load.

Just an update.

Running with kernel 4.14.5 with SVM enabled in the bios. But this time with a kernel option called “iommu=pt” added in the kernel bootup.

It is running for more than 9days and 20hours.

Got the tip to add “iommu=pt” in the kernel options from the following link.
https://forum.level1techs.com/t/just-finished-building-ryzen-system-kernel-panic-whenever-iommu-is-enabled-in-bios/121907/8

2 Likes

Back to the drawing board again. I spoke too soon and was happy too early. The computer crashed again yesterday. :disappointed_relieved::sob:

Have you tried using Window 10 for a while like it was suggested before.

Couldnt this be just Centos not really working well with Ryzen maybe?
Or maybe with your particular board.
Did you allready tried Fedora 27?

I don’t know for sure but maybe you motherboard wasn’t design to run for 24 hours for a long time. Maybe you would be better served getting a server motherboard. Also I see a trend when other community members complain about issues like you are describing and running Ryzen 7 chipset on anything else than a X370 board they are having all kinds of issues. There is such a thing as having the right tool for the job. I did read somewhere that it was true when the Zen chipset at first was having all kinds of issues with the Linux kernel, but I thought AND released an update to the Linux comunity, which fixed a lot of those issues. Maybe if @pvanthony upgraded his Linux Kernel to the latest non beta Kernel it would fix his issue. Keep in mind the latest Computer chipset I have experience with is Intel 3000 series and have just started messing around with Linux.

Not yet because I really do not think it is hardware issue. Check my coming posts.

I think it is more like Ryzen still has issues with Linux on the whole.
Yes I have tried Federa 27.

It’s true that server motherboards are cool. I usually use SuperMicro motherboards and they are great. Having said that I have a few servers that are running normal desktop motherboards and they are running for years without any problems.
The server motherboards has watchdog, IPMI and console redirect but I do not need these stuff for this current server.

1 Like

I have found more people complaining about Ryzen crashing with linux. So far two main possibilities.

  1. The voltage is too low for the cpu and ram.
    It seems that Linux is very efficient and does not run the cpu much during idle. From some reports, it seams that windows runs the cpu more during idle. Not sure about this because I do not use Windows. This explains the crash only happening during idle. When the cpu is really running, there is no problem for many days.

So possible solution is to,
a. disable "C6 state"
b. disable "cool and quiet"
c. disable "ALSR"
d. to increase the voltage slightly for the SOC and RAM

  1. Also kernel 4.15 has some fixes for Ryzen. Along with some boot parameters. Now waiting for the release of 4.15.

Got the above solutions from the following sites.
https://bugzilla.kernel.org/show_bug.cgi?id=196683
https://bugs.launchpad.net/linux/+bug/1690085
https://forums.linuxmint.com/viewtopic.php?f=49&t=256296&sid=a070c781d80a7a126b0fadec10af89be&start=20

1 Like

Just an update on the problem.

I am now sure that it is not a hardware issue. Especially since there are bug reports on this at kernel.org.

The good news is that it is working for 14days and 15hrs now. I am planning to keep this running for another 5 more days.

I have not done the bios changes suggested in the post above. I have only done the changes in the boot parameters. Here are the boot parameters for kernel 4.14.5,
“rcu_nocbs=0-15 iommu=pt modprobe.blacklist=nouveau”

I think only the “rcu_nocbs=0-15” is the magic that did it and the kernel 4.14.5. From what I understand from kernel.org, 4.15 would be even better.

Will keep you guys/gals updated.

Just wanted to share that the system is now running for 21 days. All is good so far. I am going to stop the machine and upgrade to kernel 4.15.

Ok, please let us know if the so-called fixes for Meltdown and Spectre slow down your server. Some people are reporting all kinds of problem with the Meltdown and Spectre workarounds included with kernel 4.15.

Its the first time I am getting to use this machine. So I do not have any comparison. I just hope there is nothing new that comes up.

Correction on the above. I did do a bios change. I deactivated EPU in the bios. When EPU is enabled, the machine crashes after about 9 days.

Hmm, that’s interesting. Are you satisfied with leaving EPU disabled?

Sounds like Asus is trying to drop the power consumption a bit too much when you enable EPU.

You might want to report this to Asus customer support.