So the resilver finished overnight. This is what I was looking at, afterwards:
pool: data
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: resilvered 4.45T in 10h45m with 10 errors on Sat Nov 21 04:07:19 2020
config:
NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 18
raidz2-0 DEGRADED 0 0 134
wwn-0x5000c500b2b39f7b ONLINE 0 0 0
ata-ST4000DM000-1F2168_S300GPKC ONLINE 0 0 0
ata-ST4000DM000-1F2168_Z300TQVA ONLINE 0 0 0
ata-ST4000DM000-1F2168_Z300TMRG REMOVED 0 0 0
spare-4 ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZGY31ED3 ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZGY7PJGC ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZGY7PJKY ONLINE 0 0 0
ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786 ONLINE 0 0 0
ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K2JYANKV ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZM403X3Z ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZM40355X ONLINE 0 0 0
15234179876330307149 UNAVAIL 0 0 0 was /dev/disk/by-id/ata-ST4000VN008-2DR166_ZGY3CCMN-part1
ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5 DEGRADED 0 0 0 too many errors
logs
wwn-0x5002538d702bc018-part3 ONLINE 0 0 0
cache
wwn-0x5002538d702bc018-part4 ONLINE 0 0 0
spares
ata-ST4000VN008-2DR166_ZGY7PJGC INUSE currently in use
ata-ST4000VN008-2DR166_ZGY7PJKY INUSE currently in use
ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786 INUSE currently in use
ata-ST4000DM000-1F2168_S300J2JV UNAVAIL
errors: Permanent errors have been detected in the following files:
<metadata>:<0x7e>
/data/Archive/root-2020-08/usr/lib/jvm/java-11-openjdk-amd64/jmods/java.base.jmod
Interestingly, the metadata error stayed, 2 other files were removed and a new one was added to the list or files with “permanent errors”.
During the resilver, one of the drives experienced some errors. It’s that one that has state “REMOVED” here, ata4
in dmesg and /dev/sdd
. This is most likely an unrelated problem though: probably this desktop grade disk just didn’t like the resilver. Also it’s one of the oldest ones in there I think.
Anyway, this specific disk failing does not worry me, but since another one is absent, I currently don’t have any parity disks. I need to be able to actually use 2 of the 3 spares currently “in use”.
Unfortunately though, I seem to be unable to detach them from their current use:
sudo zpool detach data ata-ST4000VN008-2DR166_ZGY31ED3
cannot detach ata-ST4000VN008-2DR166_ZGY31ED3: no valid replicas
To make matter worse, after running sudo zpool clear data ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5
to get rid of the vdevs DEGRADED state, a big resilvering started again:
pool: data
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sat Nov 21 08:32:58 2020
187G scanned out of 15.6T at 304M/s, 14h44m to go
53.3G resilvered, 1.17% done
config:
NAME STATE READ WRITE CKSUM
data DEGRADED 0 0 18
raidz2-0 DEGRADED 0 0 134
wwn-0x5000c500b2b39f7b ONLINE 0 0 0 (resilvering)
ata-ST4000DM000-1F2168_S300GPKC ONLINE 0 0 0
ata-ST4000DM000-1F2168_Z300TQVA ONLINE 0 0 0
ata-ST4000DM000-1F2168_Z300TMRG REMOVED 0 0 0
spare-4 ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZGY31ED3 ONLINE 0 0 0 (resilvering)
ata-ST4000VN008-2DR166_ZGY7PJGC ONLINE 0 0 0 (resilvering)
ata-ST4000VN008-2DR166_ZGY7PJKY ONLINE 0 0 0 (resilvering)
ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786 ONLINE 0 0 0 (resilvering)
ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K2JYANKV ONLINE 0 0 0
ata-ST4000VN008-2DR166_ZM403X3Z ONLINE 0 0 0 (resilvering)
ata-ST4000VN008-2DR166_ZM40355X ONLINE 0 0 0 (resilvering)
15234179876330307149 UNAVAIL 0 0 0 was /dev/disk/by-id/ata-ST4000VN008-2DR166_ZGY3CCMN-part1
ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5 ONLINE 0 0 0
logs
wwn-0x5002538d702bc018-part3 ONLINE 0 0 0
cache
wwn-0x5002538d702bc018-part4 ONLINE 0 0 0
spares
ata-ST4000VN008-2DR166_ZGY7PJGC INUSE currently in use
ata-ST4000VN008-2DR166_ZGY7PJKY INUSE currently in use
ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786 INUSE currently in use
ata-ST4000DM000-1F2168_S300J2JV UNAVAIL
What I am going to do now is shut her down, and see if I can mcgyver some way to get the disks all hooked up without using the backplane. I probably should have done that in the first place.
Afterwards I still have to get rid of all of those in-use-but-not-really spares.