Linux installation corrupted by memory errors

This is follow up to the previous thread about problems with running Linux after RAM upgrade:

The above thread contains also boot messages with errors. If needed can paste them here.
Since this situation is likely my mistake and not hardware I decided to split the thread here in software section.

SITREP:

Corrupted Linux installation due to memory error(s). System is booting up till the lukscrypt containing the installation is opened. After this there is a lot: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Then the process hangs on various stages never reaching shell or xsession.

  • Devuan (Debian fork without systemd), unstable ceres (sid)
  • root, var, swap, tmp, home are set up as logical volumes (LVM) inside lukscrypt, everything else is outside the crypt
  • crypt can be successfully opened during boot process
  • crypt can be opened and closed with live distro
  • can’t open GRUB menu (I have GRUB_TIMEOUT set to 0 and tried holding Esc, Shift and F4 - without effect) so I couldn’t boot with older kernels

I tried to open and mount home partition through live distro. Here is what I found:

  • lsblk shows correct block devices

  • crypt containing Linux install can be successfully opened with sudo cryptsetup luksOpen ...

  • /dev/mapper contain correct volume with provided label

  • before mounting I checked if the correct volume group exists withsudo pvs and the output shows correct vg

  • when sudo lvdisplay /dev/user-vg the output shows correct logical volumes such as tmp, home, …

  • but when I try to mount the /home partition with sudo mount /dev/user-vg/home /mnt I get: mount: mnt: special device /dev/user-vg/home does not exist. And indeed user-vg is not present inside dev.

I have backup of home and etc from around month ago, so way before the incident. All major data is safely backed up on another drives too.
I know I can do full reinstall, but want to try to do recovery if possible.

Just after opening the lukscrypt:

I’m still trying to find a way to mount the volume group, but honestly I feel a little lost as I can’t find any info how to proceed if the vg is not present.

I’m starting to losing hope that any recovery is possible.
Any help will be really appreciated, thanks in advance.

hmmmm… my LV’s show up under /dev/mapper/. Double check your path isn’t /dev/mapper/user-vg-home instead

1 Like

Checked, and /dev/mapper (in live distro) contains only control and opened lukscrypt.

If I open the crypt with sudo cryptsetup luksOpen /dev/nvme0n1p3 TANK_REC, then TANK_REC shows up in /dev/mapper, but nothing else.

No signs of individual volumes, either in /dev or /dev/mapper.

Are the volumes active?
Can you post an lvs -a?

lvs -a

  LV     VG     Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  home   rex-vg -wi------- <790.80g                                                    
  root   rex-vg -wi-------   23.28g                                                    
  swap_1 rex-vg -wi------- <127.88g                                                    
  tmp    rex-vg -wi-------   <1.86g                                                    
  var    rex-vg -wi-------    9.31g

@NukeDukem
you may want to activate them before trying to mount them

lvchange -ay rex-vg
1 Like

Many thanks @MadMatt !
The activation worked. I never had to do it manually, so I didn’t know that this step is needed.

Anyway, I’m in the process of backing-up /home now.

I’ve never recovered an install after memory corruption, and I don’t know how to diagnose size of the damages.
Namely what parts of the install can be safely backed-up, things like /etc for example.

Ok. I’ve made backups of boot, var and / too.

Now I’m wondering what would be a good angle of attack.
How safe is chrooting into mounted corrupted install? Should I assume this can make things even worse?

Post-mortem

I did a full reinstall but reused most of config files from etc and home so setting things up wasn’t so annoying. So far for couple days everything runs fine.

Packages contain checksums of the files they installed.

Check one package:

rpm -qV $PACKAGE
dpkg --verify $PACKAGE

Check all packages:

rpm -qaV
dpkg --verify
1 Like

Thanks. Might be useful in the future.