Linux stability issues

Damn, sorry to hear that.

Some positive things here though, the GPU power consumption is low, and temperatures are under control, which tells me it’s extremely unlikely to be a power-supply fault as your system is very lightly loaded.

Unfortunately though it does look like you have an issue with your motherboard. Not much else can cause a system reboot under Linux like you describe.

I do see you have quite a few things running when it crashes, like firefox and spotify. Can you reproduce the crash without any applications running at all?

Another thing which might be useful but that I have to dig a bit more. It seems that when I’m in console (CTLR + ALT + F2), the system is stable and doesn’t suffer crashes.

Because when I was doing some tests. I’ve reached the point where I would instantly freeze after entering my password (never went after the KDE Plasma gear screen). However, if I logged in console mode, it would work.

I will try tonight to use the console for an extended period of time and provide a feedback on this.

A motherboard issue would be the best thing which could happen since it still is under garantee :sweat_smile:

Like this being on the desktop without doing anything ?
I could try to leave my computer on for the day and see how it is when I come back tonight

Yeah, try to reduce the noise in the logs.

The dmesg output you provided btw seems to be rather short, I would have expected it to contain entries spanning back to system boot.

Didn’t thought you would need it

I’ve just restarted the system and I’m letting it run without any other application interfering. I’ll provide the full log once it crashes

May not but in this instance it would be a good idea to provide it all as there might be subtle hints.

1 Like

I don’t know if this has been linked to yet, but there is a bit on random Ryzen re-boots over in the Arch Linux wiki:

https://wiki.archlinux.org/title/Ryzen#Random_reboots

It notes exactly what you have observed - runs OK in Windows, but reboots in Linux. Apparently, the OSes are running the CPU at slightly different voltages.

2 Likes

I will check this. Thanks

I don’t know after how long, but it eventually ended up rebooting today will just staying onto the desktop without doing anything

The logs : [ 0.000000] Linux version 5.15.0-78-generic (buildd@lcy02-amd64-008) (gcc (Ub - Pastebin.com

[    1.008265] mce: [Hardware Error]: TSC 0 ADDR 14f079c80 MISC d012000000000000 IPID 100b000000000
[    1.008268] mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1692689832 SOCKET 0 APIC 1e microcode a20120a

CPU fault, perhaps motherboard related.

[   27.797784] cups-proxyd[3284]: segfault at 18 ip 000055dd00a1bd75 sp 00007ffdaa7d1930 error 4 in cups-proxyd[55dd00a18000+7000]
[   27.797796] Code: 83 3d ee b2 00 00 00 41 54 55 48 89 fd 53 0f 85 f4 00 00 00 48 8d 1d 69 3d 00 00 48 63 45 1c 48 89 df 48 c1 e0 05 48 03 45 08 <48> 8b 50 18 8b 70 14 e8 0f d0 ff ff 44 8b 65 18 48 89 c7 45 85 e4

Random segfault in a random process, likly due to the CPU/MB issue.

  1. Double check your CPU configuration in your bios is all auto, reset to defaults if needed. Do NOT undervolt.
  2. Double check your RAM is not overclocked, disable XMP profiles, etc.

If it happens again, RMA the motherboard as it’s the more likely culprit.

Edit: @glenjo seems to be on the money here. While pushing the voltage up may resovle this, I personally would consider this a hardware failure and RMA the CPU.

I read through the thread before posting.

The simple fact is NVIDIA still has issues with Wayland.
It’s very well known and is constantly discussed in the Arch sub reddit and other arch forums.

You aren’t going to get NVIDIA to work 100% with Wayland as there are numerous issues that arise every driver update. In every release, NVIDIA states more supported features for Wayland but it’s still spotty sometimes.

Then why are you posting suggestions that have ZERO impact on the issue here?. As stated numerous times, issues with a PCIe device of any kind CAN NOT cause a forced sudden system reboot.

Again, not the issue at hand here, the system is spontaneously rebooting.

[    1.008265] mce: [Hardware Error]: TSC 0 ADDR 14f079c80 MISC d012000000000000 IPID 100b000000000
[    1.008268] mce: [Hardware Error]: PROCESSOR 2:a20f12 TIME 1692689832 SOCKET 0 APIC 1e microcode a20120a

This IS the definitive root cause of the problem, a MCE (Machine Check Exception) is a low level fault from the CPU, 100% a hardware issue and has nothing to do with software.

I mean . . .

:facepalm:

https://wiki.archlinux.org/title/Ryzen#Random_reboots

Windows seems to run the CPUs at higher voltage and lower peak frequencies, compared to the stock linux kernel, which depending on your draw from the silicon lottery could cause a host of random application crashes or hardware errors that lead to reboots. You will recognise those by dmesg logs that look like:

kernel: mce: [Hardware Error]: Machine check events logged
kernel: mce: [Hardware Error]: CPU 22: Machine Check: 0 Bank 1: bc800800060c0859
kernel: mce: [Hardware Error]: TSC 0 ADDR 7ea8f5b00 MISC d012000000000000 IPID 100b000000000
kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1636645367 SOCKET 0 APIC d microcode a201016

If switching to Wayland/Xorg “fixes it”, you might as well tell the OP to just run Windows to fix it. It provides no additional diagnostic information whatsoever.

There is no software case at all that would trigger a spontaneous system reboot based on the display server that the OP is running, the worst that should happen is loss of display output but still be able to remote into the PC via SSH. Anything more then this indicates a falure of hardware, be it CPU, RAM or Motherboard.

The fun never ends. I think my linux install ended up getting corrupted because I can’t boot to it anymore. I just get a black screen and nothing more. I can’t even access the console.

I tried booting on an another Ubuntu install I did for testing and dmesg doesn’t show any mce error.

I think it’s trying to drive me nuts T_T

Well with that in mind have you tried booting the system into a live environment and running the linux equivalent of windows sfc /scannow?

Did you try the suggestion from the Arch wiki and push up your voltages a little? The MFC will be random as per the information provided as it’s based on peak loads and frequency.

Also edit the grub command line during boot and remove the quiet parameter.

Not needed under Linux, at a minimum the kernel should output informational messages during boot even if there is hdd corruption.

I didn’t know that! Thank you for the information.

Quick update.

I fixed my booting issue. It seems. That it tries to boot with kernel 6.2 which doesn’t work. I don’t have any log even when I removed the quiet.

Anyway, I switch back to 5.19 and it boots. Another problem for another day.

I tried the fix mentioned on the Arch Wiki. I’m not very familiar with these things so I took pictures for you to check if I’ve done it correctly.

At first it seemed to work fine. I’ve been able to use my computer for more than one hour however, it ended up crashing.

I think I’ll try to contact the seller to see if it can be RMAed


I ended up sending the CPU back to AMD. They sent me a new one and I don’t have any issue anymore.

Thanks all for your help

3 Likes