[SOLVED] Kdump cannot use NVME on Debian

I am aware that I can pass retain_initramfs to the main system kernel. But I don’t know if any way to force the kdump kernel to use it.

Also, I’d have no idea how to configure my main initramfs to have different behavior when in kdump.

My google-fu is once again failing me. Can’t seem to find any instance of somebody pulling this off.

Not even sure how doing this could help, Wouldn’t I still panic since initramfs would still be unable to find the root fs after kexec?

(also, all good questions IMO, … not a euphemism, thank you for the constructive discussion :slight_smile: )

I don’t know how kdump works, but when you manually kexec (not kdump), you pass the kernel to load, and the initrd.

I’d guess kdump probably does that prior to crash :slight_smile: ?


how to configure my main initramfs to have different behavior when in kdump.

you need to mount /proc and check /proc/vmcore.


Wouldn’t I still panic since initramfs would still be unable to find the root fs after kexec?

no, you don’t need nvme at all, initramfs contains a filesystem, and first script that run is just running from tmpfs (or is it ramfs?.. there’s a doc on kernel.org with the exact difference).

mkinitramfs has some docs, but last time I messed with it, I just read the shell scripts, … that whole initramfs filesystem that kernel execs is mostly just busybox, a bunch of symlinks, a bunch of shell scripts, and maybe a kernel module or some additional utility binary here and there.

IIRC … the shell scripts that runs as pid 1 … reads /proc/cmdline ; tries to mount it as /mnt … and then calls pivot_root and exec /usr/lib/systemd-init or some such thing. …

try reading the shell scripts in dpkg -L initramfs-tools-core

1 Like

well, im confused. when the system’s IOMMU is disabled, kdump works perfectly.
unfortunately this happens to be a VM hosting server, so running without the IOMMU is not an option. literally this server’s whole purpose is to be the production VM host.

I don’t understand why it makes a difference. Even when in production, no device inside the same IOMMU group as the root NVMe drive ever gets used in a VM. The SSD in question shouldn’t ever get touched by IOMMU.
I understand that any PCIe devices being passed to a VM would probably be in a broken state after the host suddenly kexecs, but the root SSD was never part of that, it should be completely unaffected.

Oooo ! You found a bug! Congratulations!

Can you narrow down (kdump functionality) to a specific driver or device?

(you’re using vfio or … ?)

2 Likes

I have vfio enabled for some devices, but not on the root SSD. That always uses the nvme driver. So IOMMU should be irrelevant here. Yet it isn’t. I don’t understand.

I know this nvme driver works, because everything is flawless when the systems IOMMU is disabled.
I don’t see how the nvme driver’s behavior could possibly be affected by whether IOMMU was just on or off.

Yeah, some driver for something is assuming that hardware (iommu?) is configured a particular way on startup … in reality it’s not.

Every driver should assume kexec to “be a thing”, and it should either explicitly reinitialize the piece of hardware on startup, or explicitly discover and adopt the state as is.

Then again, software developers cut corners and engineer brand new bugs and call it technical debt that’s clearly not their fault due to constraints make tradeoffs, as part of their jobs all the time. The fact something is being less carefully considered is not a problem.

However, maybe you ca kexec the same kernel with different command line params to disable some stuff … if you knew what to disable.

2 Likes

My current kdump kernel is just a symlink to the main system kernel. This is the default behavior of Debians kdump package.

right, but you can always keep multiple kernels installed, and you can configure kdump differently and/or use dpkg-divert to not have it interfere with your setup

1 Like

okay. what changes should i make to kdump’s kernel to try and rectify this? i have no idea why nvme is failing after kexec normally, so i’d have no idea where to start with trying to get kdump’s kernel to work.

Fixed by replacing the SSD.

That fix sounds extremely random. Good find, none the less.

1 Like