Linux System crashing randomly

I recently setup a server running Jellyfin and a Caddy reverse proxy on Ubuntu Server 23.04.
However I noticed that this system would randomly crash about once a day.
Specs are as follows:

  • Intel Core i5-9400F
  • ASUS B360-F
  • DDR4 8GB Single Stick 2400mt/s
  • 2x WD Green 120GB SSD (Media is mounted to this host via a SMB share)
  • Intel Arc A750 GPU
  • FSP Hexa 85+ 350w PSU

What might be the problem and how do I troubleshoot this thing?

UPDATE:
I ran Memtest86+ on it and it passed

You need to look at logs but I dont know which is pertinent.

Can you infer what time it happens and what activity coincides with it?

1 Like

Crash and reboot? … or crash and get stuck?

Does any of this work: Kernel crash dump | Ubuntu

Can you force a crash? (to get more debugging opportunity?)

Crashes and gets stuck

No as I don’t know when it crashed exactly. Every time I know it crashed is I try to access it but find it down.

I don’t know, how do I test?

I don’t know how to do this.

If you look closer on that page there’s a test section.

Re: forcing/triggering crashes… not sure. We don’t know whether it’s hardware or software… you could try stressing the system with other burn in tools.

Since you mentioned stuckness, after checking up on crashlogs, look for information on watchdogs.


Are you using the Intel Arc for transcoding? You could also try stressing the GPU part of the system by starting and stopping jellyfin playback a lot somehow, … not sure how to script it.

Right now, I’m leaning towards either power supply+motherboard not being able to handle big transients well… or Intel Arc drivers… but this is wildly speculating.

Yes, but it never crashed while I’m trying to playback something.

The system immediately hung after I triggered a crash from the console the first time
It got stuck at pstore: crypto_comp_compress failed, ret = -22! the second time I triggered a crash

The grub config file mentioned in that section for changing the memory size doesn’t exist on my system

Leaning towards hardware and power stuff more, that thing is supposed to return length of compressed data, -22 makes no sense unless it’s some wonky undocumented error code.

… can you fiddle with power options/CPU governor maybe, make it not go down all the way to C10 or those fancy power saving states somehow? … do you have a screen/keyboard… can you check the UEFI for CPU power states?

(btw, changing this just as a diagnostic thing, not as a permanent solution).

Clearing the UEFI settings might also be a thing you want to try (not sure if you were tuning it to save on your power bill).

The bios shows that all the way down to C10 is supported and C-states is set to Auto. Should I disable C-states?

I wanted to tune the clock speed down but it won’t let me go below base clock so I left it on auto.

I disabled c-states and tried to trigger the crash again. it got stuck at kernel offset: 0x… from 0x… (relocation range: 0x…-0x…)

Maybe something is really off with the crash handling and triggering code (oh Ubuntu how you disappoint).

C10 is a save-a-lot state, try doing CPU governor performance, disable C8/C10, and just use the box for a day or two, see what happens. (unless someone has a better suggestion).

Try looking up other burn in options in the meantime.

I think this works as a two-for-one ddx, you spend two days and if crashes increase you know its not software, if crashes decrease you know its not software. … (shitty that it’d take two days).

(btw, sorry I can’t think of something better).

for a 7600k, c0-c8 was about 4.5w powersaving. if they os goes idle.

my 12100 doenst go further as c3, tried everything in bios.

I have setup a little script on my system to automatically ping it periodically then print out the time and exit if it fails for 5 consecutive times.