Having quite a peculiar problem with my nvme SSD that I wasn’t able to solve on my own, perhaps someone here can help.
Here’s a TL;DR of what happened: I was updating my Linux mint install when my system suddenly went into read-only mode. I went into recovery mode and ran ‘dpkg – configure -a’, I booted up again and the OS goes immediately into read-only mode again.
This time I booted from a live usb and ran:
mint@mint:~$ sudo e2fsck -cfpv /dev/mapper/vgmint-root
/dev/mapper/vgmint-root: Updating bad block inode.
579180 inodes used (0.47%, out of 122011648)
2162 non-contiguous files (0.4%)
1290 non-contiguous directories (0.2%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 469942/772/2
239252046 blocks used (49.02%, out of 488031232)
125 bad blocks
35 large files
402062 regular files
67201 directories
55 character device files
25 block device files
0 fifos
55787 links
109812 symbolic links (108360 fast symbolic links)
16 sockets
------------
634958 files
Also:
mint@mint:~$ sudo nvme smart-log /dev/nvme0
Smart Log for NVME device:nvme0 namespace-id:ffffffff
critical_warning : 0
temperature : 38 C
available_spare : 100%
available_spare_threshold : 5%
percentage_used : 2%
data_units_read : 88660714
data_units_written : 60548722
host_read_commands : 845681396
host_write_commands : 584031973
controller_busy_time : 2137
power_cycles : 1705
power_on_hours : 16467
unsafe_shutdowns : 35
media_errors : 5261
num_err_log_entries : 8061
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count : 31
Thermal Management T2 Trans Count : 24
Thermal Management T1 Total Time : 14467
Thermal Management T2 Total Time : 672
One of the packages I’ve updated was ‘linux-firmware’, which I suspect what caused all this.
I’ve also tried running ‘sudo nvme error-log /dev/nvme0’, but it only gives me 63 entries, all of which are basically empty.
Well that’s something to work with. Is it identifying your device as the correct size? Can you read ANYTHING past that half-way mark (dd with a skip= parameter or ddrescue should help you find out). If it’s just a few bad-blocks that can’t be read, you can overwrite them and your drive will be good to go.
The easy answer is to backup all your data from the drive elsewhere, then wipe the drive and restore everything.
smartctl -x /dev/nvme0 may give some insight into whether the drive hardware is failing.
You can’t, without severe risk of corrupting your file system. A backup, full wipe, and restore is the best option. It’s good to know it isn’t the whole second half of your drive being inaccessible or something like that, and rules out any way I can think of that your system update could have caused the drive to malfunction.
If you really want to try something, you can mount the partition rw, create a huge file that fills-up all unused space, then erase it, sync, and fstrim, and see how your drive fares in subsequent tests. Might fix several issues, but it might not.
Seems like quite a few Media and Data Integrity Errors there. Your drive should work again after a wipe, but it is now suspect, and I would recommend frequent backups, and weekly Tripwire style integrity checks of all files on it to be sure data isn’t being silently corrupted going forward.