A years and a half after losing a luks partion, it broke again

Hey :slight_smile:
This is kind of a follow up to Can't unlock luks since fedora 34 5.13.10

It’s the third time that luks make me feel like I’ve forgot my passphrase, or always typed it wrong but suddenly could not. So last time i formatted my ssh, i used my password manager to write the password in the luks field just un case. That was a year ago.

Last week (during Xmas… yay…) i could once again nolonger unlock my os, or like 1 time out of 50 try with the same password.
I used that time to make a DD dump of the unlocked partion, but while running on it.
I’ve booted the ferdora 36 livecd to wipe the luks partition and make a new one, but this don’t work either (see video)


Can it be a hardware issue ? Why would both the latest kernel from fedora 37 and an old one from a fedora 36 livecd fail to open a luks partition ?

I’m running a AMD Ryzen 9 5900X with 32gb of ram on a x570 Taichi razer motherboard. This have been very unstable under linux, but i never had the luck to find why… i’ll be under a new windows install for a couple week, while i try to understand what’s wrong with luks

wow, super annoying.

I’ve only ever touched LUKS via command line (that I remember).

here’s a few things to get you started:

  • lsblk

  • dmsetup info will list your device mapper volumes … including stuff currently configured with lvm and unlocked with luks.

  • cryptsetup isLuks /dev/nvme0n1p4 && echo yes || echo no will tell you whether there’s a luks header on the partition

  • cryptsetup luksDump /dev/nvme0n1p4 shows the header contents

  • cryptsetup luksHeaderBackup / cryptsetup luksHeaderRestore - is there to help if in case of block level corruption – with this data, you can decrypt everything else - it’s useful to back things up.

  • cryptsetup open /dev/nvme0n1p4 some_decrypted_volume_name should ask you for a password to “unlock the volume” and it should be available in /dev/mapper/some_decrypted_volume_name afterwards. cryptsetup close does the inverse

  • mount ... with whatever filesystem you choose


no idea how fedora does things, but I know systemd makes luks units based on whatever is stored inside of crypttab and optionally integrates with kernel keyrings and so on - the commands I cited above bypass all of that “helpful infrastructure” and just poke at the kernel and device mapper using syscalls and ioctls and do things at a fairly low level


have you already filed any bugs with fedora folks?

2 Likes

i need to see if this is a thing with luks2, because everything is saved in the kernel keyring, i couldn’t change the key :frowning:

i’ve try this a lot without sucess, since it got … “corrupted”… somehow, i only managed to unlock it from boot once with tone of try and retry.
Never since.

No
I’ve try once or twice, i never know how to make sure it’s not a duplicate, never get a reply, and it eventually get closed avec 2 major release of fedora because outaded.

I do it for github projet and whatnot, but for distrib and major software, i’ve given up.

yes, it is - everyone should just use luks2 these days, it’s a bigger and more flexible header. (uses json IIRC).

The kernel keyring, if used at all, would actually only store your password or key; it woudn’t directly store the key material directly used for actual aes-xts (or whatever you’re using). The “real/actual key material” is stored in the header, encrypted with your password/key.

you can still decrypt it and look at it and do whatever e.g. if you want to decrypt blocks with dd and openssl, but generally it doesn’t need to leave dm-crypt.


Try doing something via that gui, and then drop down to command line and have a look around (maybe even before AND after).

… or try launching the utility via the terminal with --verbose … or --debug ; or look around with --help to see if you can find what flags the utility supports. It might tell you what other lower level command line utilities are being run by the gui.

I’ve finish réimporting the dump i did on the running system … and i fucked up XD
i backuped the luks partion … not the the one under it :facepalm:

luksHeaderBackup return a bin file i would need to decipher.
luksDump return this

Summary
[liveuser@localhost-live bckp]$ sudo cryptsetup luksDump /dev/nvme0n1p4
LUKS header information
Version:       	2
Epoch:         	3
Metadata area: 	16384 [bytes]
Keyslots area: 	16744448 [bytes]
UUID:          	32ec8869-4751-4ce3-8c58-593d37642f1f
Label:         	(no label)
Subsystem:     	(no subsystem)
Flags:       	(no flags)

Data segments:
  0: crypt
	offset: 16777216 [bytes]
	length: (whole device)
	cipher: aes-xts-plain64
	sector: 512 [bytes]

Keyslots:
  0: luks2
	Key:        512 bits
	Priority:   normal
	Cipher:     aes-xts-plain64
	Cipher key: 512 bits
	PBKDF:      argon2id
	Time cost:  12
	Memory:     1048576
	Threads:    4
	Salt:       d9 ce fd 41 1e dc 51 56 fd c5 e9 79 a7 c9 23 a6 
	            67 e1 4c 2b 62 0b 18 97 ff f2 35 24 0f 5e 9a 74 
	AF stripes: 4000
	AF hash:    sha256
	Area offset:32768 [bytes]
	Area length:258048 [bytes]
	Digest ID:  0
Tokens:
Digests:
  0: pbkdf2
	Hash:       sha256
	Iterations: 275650
	Salt:       d0 39 1e f3 af e3 d5 5c 61 28 b1 5c ef d5 0d 3d 
	            41 90 6e ae 7c 86 e2 b9 26 90 67 44 68 f5 92 06 
	Digest:     bb 65 d2 d1 6c 7a a3 6f ad 88 7e 8b d8 5e 8e 48 
	            6c 99 30 33 13 fa 3e b8 2e f0 91 07 b6 d3 f4 36

open don’t work with my password
let wipe it all again and see why i can’t make a clean one

weirdly enough, i can’t

[liveuser@localhost-live bckp]$ sudo wipefs -a /dev/nvme0n1p4
[liveuser@localhost-live bckp]$ sudo cryptsetup luksFormat -v /dev/nvme0n1p4

WARNING!
========
This will overwrite data on /dev/nvme0n1p4 irrevocably.

Are you sure? (Type 'yes' in capital letters): yes
Operation aborted.

Command failed with code -1 (wrong or missing parameters).
1 Like

also, I just happened to notice - unrelatedly:

sector: 512 [bytes]

this is a bit “stupid”, what does lsblk -t look like for you (I think maybe you forgot to nvme format .. before starting to use the drive)?

3 Likes

possible, i used the fedora install wizard for it as it was a production system, didn’t fell like tweaking it

Summary
[liveuser@localhost-live ~]$ lsblk -t
NAME        ALIGNMENT MIN-IO OPT-IO PHY-SEC LOG-SEC ROTA SCHED RQ-SIZE  RA WSAME
loop0               0    512      0     512     512    1 none      128 128    0B
loop1               0    512      0     512     512    1 none      128 128    0B
├─live-rw           0    512      0     512     512    1           128 128    0B
└─live-base         0    512      0     512     512    1           128 128    0B
loop2               0    512      0     512     512    0 none      128 128    0B
└─live-rw           0    512      0     512     512    1           128 128    0B
sda                 0   4096      0    4096     512    0 bfq        64 128    0B
├─sda1              0   4096      0    4096     512    0 bfq        64 128    0B
├─sda2              0   4096      0    4096     512    0 bfq        64 128    0B
├─sda3              0   4096      0    4096     512    0 bfq        64 128    0B
└─sda4              0   4096      0    4096     512    0 bfq        64 128    0B
sdb                 0    512      0     512     512    1 bfq         2 128    0B
└─sdb1              0    512      0     512     512    1 bfq         2 128    0B
sdc                 0   4096      0    4096    4096    1 bfq         2 128    0B
sr0                 0   2048      0    2048    2048    1 bfq         2 128    0B
zram0               0   4096   4096    4096    4096    0           128 128    0B
nvme0n1             0    512      0     512     512    0 none     1023 128    0B
├─nvme0n1p4         0    512      0     512     512    0 none     1023 128    0B
├─nvme0n1p1         0    512      0     512     512    0 none     1023 128    0B
├─nvme0n1p2         0    512      0     512     512    0 none     1023 128    0B
└─nvme0n1p3         0    512      0     512     512    0 none     1023 128    0B

Wierd stuff … i tried a dnf update from the live cd and had to ignore GPG as it wouldn’t validate Oo
Is it possible to have a hardware issue that make cryptographic action broken ?
How can i test that ?

Not exactely… A LUKS volume has a header with keyslots (password slots), the encrypted master key used for the data encryption and some other stuff. When you unlock a LUKS volume, your password will be hashed according to info in the header and matched up against these slots. If a match is found, the master key (also located in the header) is decrypted and placed in the keyring and successively used to access data in the LUKS volume. So, the LUKS-header is extremely vital for accessing data in the encrypted volume and whenever the header is modified (add/remove/change passwords etc.), it should absolutely be backed up using the luksHeaderBackup command and stored some place safe (offline storage).

If the header on the drive should become corrupted due to hardware failure, dirty shutdowns or what else, you can recover the working header with luksHeaderRestore from your backup(s). If the physical location where the header should be restored to is damaged, you can use ddrescue to recover the encrypted volume onto another media and restore the header there.

Using these steps should ensure that data loss due to using encryption is not a real issue at all. Having a bad sector will only result in that particular sector to be lost (and maybe the following sector if using block chain cipher).

So, no need to decipher the header. In fact, what the binary copy of the header contains is absolutely irrelevant for backup purposes.

Theoretically anything is possible, but to open/unlock a volume you only need the header to match up with your password. Header isn’t really changed often either. I’d expect you’d be seeing all kinds of other issues elsewhere before running into an issue with the teeny tiny only rarely performed opening of the volume.

It’s far more likely that disk tool you’re using is somehow buggy.

Since you’ve lost data from there already (no old headers you can unlock), do the nvme format to 4k sector/block size, and build a new set of partitions with cgdisk , cryptsetup, and mkfs… and email yourself the header backup afterwards. And look at manpages along the way - it might be helpful to future you.

I really don’t get it,
You know the img of the luks partition i took before wiping it ?
I putted it on a hdd, moved it to my work laptop (fedora 35), and i could mount and unlock it without any issue with my password

So there is SOMETHING that make multiple ISO, and the grub luks unlock thingy broken on my disktop…

Honestely i’m getting back on hardware issue, i don’t see what coud be the cause software wise with multiple OS and the original disk

Well my ram have been failing memtest86 …

2 Likes

So, it look like my once stable overclock on the ram and infinity fabric degraded over time …
I’ve wipe the bios and redone everything, i’m now stable again but a 3666 MT/s rather than 3800 MT/s on a 4000 MT/s capable ram.

I’ve managed to boot back the os from the SSD and unlocked the Luks partition.
Sadly, it look like BTRFS didn’t like having broken ram …

sudo btrfs scrub status /dev/mapper/luks-32ec8869-4751-4ce3-8c58-593d37642f1f
UUID:             0c48d9d9-d31b-47ad-9b02-d93aa975ba45
Scrub started:    Mon Jan  2 23:57:50 2023
Status:           finished
Duration:         0:01:24
Total to scrub:   409.22GiB
Rate:             4.87GiB/s
Error summary:    verify=1 csum=325
  Corrected:      4
  Uncorrectable:  322
  Unverified:     0

I have backup from the duplicity thing, but i will try to get each FD uncorrectable, and see if i can’t just delete them or reinstall the pkg… setting everything back up would take more time than i have left in my holiday, and i’ve already done nothing of what was planed because of this.

Right now dmesg only return 74 match of the 322
dmesg | grep "checksum error at" | cut -d\ -f24- | sed 's/.$//' | sort | uniq | wc -l

1 Like

Could be uniq is removing similar lines.

Try something along the lines of:

journalctl --since="2012-10-30 18:17:16" --output=cat --grep='BTRFS .* i/o error' | less

from Identify damaged files - ArchWiki

and fix the more recently detected errors first.

I don’t know on fedora, but Debian and Arch package managers have this feature where they can verify all installed package files match their package checksums. I’m sure Fedora has the same thing, try that first… even if it crashes half way through, it’ll be useful info for you.


Also, consider adding some crons to check journals for different types of problems and have it email you a report. (In case things degrade further).

1 Like

Sorry if it’s dumb, but i’m reading that btrfs dosen’t have static sector size ?

I was talking about the nvme itself, underlying any partitions.
Going with 4k sector sizes reduces some of the processing overhead in nvme and in the storage drivers and inside LUKS.
It’s like “jumbo frames” but for a block device.

btrfs can work on either 512byte or 4kib, and I wouldn’t be surprised if other sizes were supported too.

It’s always memory errors, get some ECC and random corruption will be a thing of the past! :smiley: