Hi Gang!
I know it’s a longshot, but wondering if anyone has ideas how I might recover a corrupted ZFS pool. Not sure on what diagnostic information to provide, I’ll post up what I have and what I tried so far.
I originally had a 12 drive pool arranged as a single vdev, raidz2. Due to power distribution issues (too many sata splitters and extension cables) I would get random drive drops and server crashes. After the last crash (May 29th) the pool refused to import, so I have had the server shut down while I built custom SATA power cables and cleaned up the data cabling. My server is running Fedora 35, and the latest stable DKMS release of OpenZFS.
I do have backups of the important data from this pool, but they’re cloud backups, so restoring them will take a while. Thought I would give recovering the pool a shot first.
First off, here’s the output from zfs import
showing the pool, including a missing one device which is a drive that has completely failed and no longer accessible. That drive has been pulled from the server, and is inaccessible on my test machine as well.
[root@superx10 ~]# zpool import
pool: wide
id: 12866334539261191398
state: FAULTED
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
The pool may be active on another system, but can be imported using
the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
config:
wide FAULTED corrupted data
raidz2-0 DEGRADED
wwn-0x5000c50063bbf42a UNAVAIL
wwn-0x5000c50063c0fea6 ONLINE
wwn-0x5000c50063c75cff ONLINE
wwn-0x5000c50063c9e54d ONLINE
wwn-0x5000c5006435dab6 ONLINE
wwn-0x5000c5006435f0ba ONLINE
wwn-0x5000c50092937c98 ONLINE
wwn-0x5000c50092939808 ONLINE
wwn-0x5000c5009295a5e5 ONLINE
wwn-0x5000c5009296f867 ONLINE
wwn-0x5000c5009297eb04 ONLINE
wwn-0x5000c50092989b50 ONLINE
logs
mirror-1 ONLINE
nvme-eui.0025385581b1b75e-part4 ONLINE
nvme-eui.6479a751d0c0005a-part4 ONLINE
Next step, trying basic recovery…
[root@superx10 ~]# zpool import -f -F wide
cannot import 'wide': I/O error
Destroy and re-create the pool from a backup source.
[root@superx10 ~]# zpool import -d /dev/disk/by-id -f -F wide
cannot import 'wide': I/O error
Destroy and re-create the pool from a backup source.
[root@superx10 ~]# zpool import -d /dev/disk/by-id -f -F -XN wide
cannot import 'wide': one or more devices is currently unavailable
I tried setting zfs_recover in module parameters, hasn’t changed the output of the above recovery steps.
Found an article from 2011 about label corruption, and the diagnostic command zdb -lll
was recommended, so here’s my output, showing all four labels intact, this is the output:
Here is the output from zdb
without any extra arguments: