Crashes - Centos 7 with AMD Ryzen 7 and Asus PRIME-B350M-A

Hi,

I am having random crashes with Centos 7, AMD Ryzen 7 and Asus Prime-B350M-A motherboard.

Did a minimal install and just ran “top” on the command line. The computer would crash between the 3rd and 4th day from the boot time.

I have done the following.

  1. When installing Centos 7, did a checksum to make sure the installer is good.
  2. Did a memtest to check the 32GB of ram. 4 sticks of 8GB. All passed with multiple passes
  3. Changed the power supply and still the same behaviour.
  4. Did a RMA for the AMD Ryzen cpu to get the version that is after the 25th week production batch. Still same behaviour.
  5. Installed the latest kernel. Version 4.14. Still the same behaviour.
  6. Installed the latest bios. Version 3402. Still the same behaviour.
  7. Disconnected the sata hard drive and ran from USB live fedora 27. Still the same behaviour.

The only way I got 10days without crashing is by deactivating the SVM setting in the bios. The virtual machine setting. I stopped the computer after 10 days.

Then I ran prime95 with the default bios setting. The default bios setting has the SVM turned off. It ran at 100% for all the cores. This ran for 8 days. I stopped prime95 after the 8th day.

Also ran memtester from within centos and the memory passed.

Has anyone got a similar experience? Is there anything else I can do to make it run longer?

I was hoping to run it as a server. Need advice and help.

P.V.Anthony

What kernel is centos using? I believe centos stays way behind the curve and so kernel probably does not have some of the things to play nice with ryzen.
My bad missed the part about latest kernel.

1 Like

Thank you for the reply.

If you come across any website or any information about this, please share.

P.V.Anthony

in what way does it crash. kernel panic or hard lockup? is the display frozen, blank, or will it not come on at all? are there logs recorded of the crash?

How did you install the kernel? elrepo? Did you update Grub to use it?

I would rule out the OS by installing Windows on it and running some additional torture testing. You may have a fault in the MOBO pushing under load.

1 Like

What kind of crash are you encountering? Segfault, kernel panic, system lockup are all different and have different methods of troubleshooting.

If you haven’t already, please clone and run this test. You may suffer from the segfault bug.

It was a kernel panic.
Checked the /var/log/messages and unfortunately there is nothing there.

Is there any other place that stores crash logs?

Yes, I am using the elrepo. kernel-ml-4.14.5-1.el7.elrepo.x86_64
I did update the grub so that 4.14 kernal would be auto booted.

1 Like

This is one thing that I have not done. I do not have windows. I wonder if windows 10 is available for demo for 14days.

it is download from microsoft. It will run and just annoy you about the key

1 Like

I did an RMA to get a replacement from AMD. Having said that, I now running the test again to be very sure.

1 Like

Will do that once it crashes again. Currently running the kill-ryzen test now.

Yeah like @sanfordvdev said, you can go to microsoft.com and download the latest version and it will run. It will not be a signed system until it gets a key but no big deal.

if you have systemd you can use journalctl. journalctl -b -1 will give you logs from the previous boot

Noted. Will try once there is a crash.

Did the journalctl -b. Did not see anything suspicious.

Notice the following. Maybe because of this I am not getting any crash logs. Any idea how to rectify this?

Jan 02 02:22:16 localhost.localdomain systemd[1]: Starting Crash recovery kernel arming…
Jan 02 02:22:16 localhost.localdomain kdumpctl[1303]: No memory reserved for crash kernel
Jan 02 02:22:16 localhost.localdomain kdumpctl[1303]: Starting kdump: [FAILED]
Jan 02 02:22:16 localhost.localdomain systemd[1]: kdump.service: main process exited, code=exited, status=1/FAILURE
Jan 02 02:22:16 localhost.localdomain systemd[1]: Failed to start Crash recovery kernel arming.
Jan 02 02:22:16 localhost.localdomain systemd[1]: Unit kdump.service entered failed state.
Jan 02 02:22:16 localhost.localdomain systemd[1]: kdump.service failed.

This should have you covered:

Wow! Thank you very much for the documentation. Currently I am running kill-ryzen script.

I will set this up once there is a crash.

Thank you again for helping.

1 Like

If it’s a kernel panic, it’s most likely not the hardware bug.