Help with investigating a kernel panic with kdump

So I’ve been having this rather annoying and mysterious kernel panic/full system crash problem the last few months on my personal desktop computer. Every few days the whole system locks up, the screen goes black, but not off. SSH also fails to connect so I’m led to believe this isn’t a simple xorg crash. At this point all the fans are spinning, the monitor is on, but displaying only black, and my only real option is forcing it off via the power button.

I haven’t had much time to debug it all summer, but last week I was fed up and went and read through /dev/log/syslog. It had logs of normal dhcp requests and cron jobs then few hours are skipped with no indication of a power down. After that it logged modules being loaded and all the other shit from me turning it back on. Naturally I assumed that I’m dealing with a kernel panic (or maybe systemD?) since it failed to log any sort of crash (I checked /var/log/messages and found a similar lack of information).

Now before I go any further let me explain my setup a little bit. A few months ago when this problem had first begun happening. I reinstalled my os, Debian 9 at the time, and when that didn’t seem to fix it I went and bought a new motherboard, ram, and processor (I needed an upgrade anyway since I was running an old-ass i3).

Now I have

  • MSI x470ac (new)
  • AMD Ryzen 5 2600X (new)
  • 16Gb or ram (new)
  • AMD Sapphire 290x GPU (old)
  • 128GB sata SSD (old)
  • 2x1TB sata WD HDD (old)

Additionally when I installed my new hardware I also went and reinstalled Debian. This time testing instead of stable since I needed some slightly newer software. That said I’ve experienced this crash running on just about every single kernel version from 4.9 to 4.18 additionally it’s relatively standard hardware so I’m pretty sure the problem isn’t me running Debian testing.

After seeing the log messages last week (or rather the lack thereof). I installed kdump. I’m pretty sure it auto-configures it on debian since when I reboot it tells me the crash kernel is being loaded and such. I couldn’t seem to find any great information about any weird extra setup steps for kdump on debian than normal and the man page proved to be pretty unhelpful and confusing.

Today my computer crashed again. I turned it off and then on and was excited to finally dig into some nice juicy kdump generated log files. I ran crash as root and was presented with this message. crash: cannot find booted kernel -- please enter namelist argument now I’ve pretty much never used kdump or crash before so there’s a good chance I’m missing some important step. I thought the crash kernel was loaded though since it says it’s loading when my computer is turning on, plus when I installed kdump it ran update-grub automatically.

I determined the best course of action was to make sure kdump and crash were functioning by simply triggering a kernel panic manually.

I ran became root with su then ran echo c > /proc/sysrq-trigger that did absolutely nothing and my system continued chugging along. Then I saw that maybe I need to run echo 1 > /proc/sysrq-trigger first. So I ran both commands a bunch of times, yet nothing happened and nothing crashed.

Frustrated I tried running kill -9 1 which printed out a message about systemd crashing, but otherwise my computer was still operating just fine. kill -6 1 did basically the same thing.

Next I went and turned off my swap with swapoff /dev/sda5 then ran for r in /dev/ram*; do cat /dev/zero > $r; done to fill up my memory and hopefully cause a kernel panic that way. The for loop just exited after about half my ram was used and said it was out of memory. The computer was still going strong.

The only hardware that is the same since this whole thing started is my mouse, keyboard, monitor, power supply, gpu, hard drives, and the case. At this point I have no idea what the fuck is going on with this computer so and suggestions of help would be really appreciated.

1 Like