Zpool Offline. Any Hope of recovering anything?

So I was on the couch with my wife and we were about to start a movie, time was about 8:00. The movie is hosted on my TrueNAS box and we were watching it on Plex which is also hosted on the same box. Everything was normal, but the movie wouldn’t play. I was home all day, and didn’t notice any problems or receive any alerts from my NAS.

I came down to my office to take a look and I was getting weird I/O errors when I tried to access files on my SMB share. I signed into TrueNAS and it said my pool was degraded and I had 40,000 some odd errors on one of the drives in one of my mirrors. Okay, no problem, I will eject the bad drive from the pool in the GUI and then go physically replace it with a cold spare. The system froze for several minutes and I couldn’t do anything. I could still navigate the SMB share, but the TrueNAS UI was unresponsive. I tried to SSHinto the box, and the CLI was not responding. Then I got an email:

TrueNAS @ prod

New alert:

  • Pool sadness state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
    The following devices are not healthy:
    • Disk HUH721010AL4200 7PG33KKR is UNAVAIL
    • Disk HUH721010AL4200 7PG3RYSR is DEGRADED

The following alert has been cleared:

  • Pool sadness state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
    The following devices are not healthy:
    • Disk HUH721010AL4200 7PG33KKR is UNAVAIL
    • Disk HUH721010AL4200 7PG27ZZR is DEGRADED
    • Disk HUH721010AL4200 7PG3RYSR is DEGRADED

Current alerts:

  • Failed to check for alert ActiveDirectoryDomainHealth: Traceback (most recent call last): File “/usr/lib/python3/dist-packages/middlewared/plugins/alert.py”, line 776, in __run_source alerts = (await alert_source.check()) or [] File “/usr/lib/python3/dist-packages/middlewared/alert/source/active_directory.py”, line 46, in check await self.middleware.call(“activedirectory.check_nameservers”, conf[“domainname”], conf[“site”]) File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1386, in call return await self._call( File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1335, in call return await methodobj(*prepared_call.args) File "/usr/lib/python3/dist-packages/middlewared/plugins/activedirectory/dns.py", line 210, in check_nameservers resp = await self.middleware.call(‘dnsclient.forward_lookup’, { File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1386, in call return await self._call( File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1335, in _call return await methodobj(*prepared_call.args) File “/usr/lib/python3/dist-packages/middlewared/schema.py”, line 1318, in nf return await func(*args, **kwargs) File “/usr/lib/python3/dist-packages/middlewared/schema.py”, line 1186, in nf res = await f(args, **kwargs) File “/usr/lib/python3/dist-packages/middlewared/plugins/dns_client.py”, line 108, in forward_lookup results = await asyncio.gather([ File “/usr/lib/python3/dist-packages/middlewared/plugins/dns_client.py”, line 40, in resolve_name ans = await r.resolve( File “/usr/lib/python3/dist-packages/dns/asyncresolver.py”, line 114, in resolve timeout = self._compute_timeout(start, lifetime) File “/usr/lib/python3/dist-packages/dns/resolver.py”, line 950, in _compute_timeout raise Timeout(timeout=duration) dns.exception.Timeout: The DNS operation timed out after 12.403959512710571 seconds
  • Pool sadness state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
    The following devices are not healthy:
    • Disk HUH721010AL4200 7PG33KKR is UNAVAIL
    • Disk HUH721010AL4200 7PG3RYSR is DEGRADED

Then immediately after I received this email:

ZFS has detected that a device was removed.

impact: Fault tolerance of the pool may be compromised.

eid: 30800

class: statechange

state: UNAVAIL

host: prod

time: 2023-04-17 20:22:34-0400

vpath: /dev/disk/by-partuuid/e2f3d3c3-0033-4300-8c38-a7a56513f145

vguid: 0xF52756F46C368319

pool: sadness (0x8A76BCC157F6D093)

After a few about 7 or 8 minutes of nothing, I told the server to reset via the IPMI.

When the server came back up, the pool was in an errored state. I thought to export the pool and re-import the pool, as I’ve had some success with that in the past. But when I exported it, the option to import the pool wasn’t present. I SSH’d back into the server and tried to manually import the pool:

root@prod[/var/log]# zpool import -a

cannot import ‘sadness’: no such pool or dataset

Destroy and re-create the pool from

All of the disks are, at this point, showing up in the UI as in the pool called sadness, but exported. That is, with the exception of one, the one I removed earlier:

I tried to remove one of the SAS cables from each of the shelves, so that I had 1 SAS cable going from a single HBA to each of the two shelves. That didn’t help matters any.

Digging in the logs, I’m not seeing much of anything useful in /var/log/messages before the system restarted at 8:30ish. At 2 AM today, I see some weird events happening:

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: ses 13:0:9:0: Power-on or device reset occurred

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: ses 3:0:14:0: Power-on or device reset occurred

Apr 17 02:07:14 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:14 prod.fusco.me kernel: ses 3:0:28:0: Power-on or device reset occurred

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm1: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:16 prod.fusco.me kernel: ses 3:0:28:0: Power-on or device reset occurred

Apr 17 02:07:17 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:17 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:17 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:17 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:17 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:18 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Apr 17 02:07:20 prod.fusco.me kernel: mpt3sas_cm0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

Then I tried to restart a VM that was hosted on this pool that I hadn’t noticed had crashed at around 8:17:

Apr 17 20:17:36 prod.fusco.me kernel: br0: port 2(vnet1) entered disabled state

Apr 17 20:17:37 prod.fusco.me kernel: device vnet1 left promiscuous mode

Apr 17 20:17:37 prod.fusco.me kernel: br0: port 2(vnet1) entered disabled state

Apr 17 20:17:37 prod.fusco.me kernel: kauditd_printk_skb: 7 callbacks suppressed

Apr 17 20:17:37 prod.fusco.me kernel: audit: type=1400 audit(1681777057.381:67): apparmor=“STATUS” operation=“profile_remove” profile=“unconfined” name="libvir>

Apr 17 20:17:38 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name ‘12_casey’

Apr 17 20:17:39 prod.fusco.me kernel: audit: type=1400 audit(1681777059.949:68): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name="libvirt->

Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.061:69): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.205:70): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.317:71): apparmor=“STATUS” operation=“profile_replace” info="same as current profile, s>

Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.489:72): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:17:40 prod.fusco.me kernel: br0: port 2(vnet2) entered blocking state

Apr 17 20:17:40 prod.fusco.me kernel: br0: port 2(vnet2) entered disabled state

Apr 17 20:17:40 prod.fusco.me kernel: device vnet2 entered promiscuous mode

Apr 17 20:17:40 prod.fusco.me kernel: br0: port 2(vnet2) entered blocking state

Apr 17 20:17:40 prod.fusco.me kernel: br0: port 2(vnet2) entered listening state

Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.729:73): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.809:74): apparmor=“DENIED” operation=“capable” profile=“libvirtd” pid=16394 comm="rpc-w>

Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.813:75): apparmor=“DENIED” operation=“capable” profile=“libvirtd” pid=16394 comm="rpc-w>

Apr 17 20:17:40 prod.fusco.me kernel: audit: type=1400 audit(1681777060.881:76): apparmor=“DENIED” operation=“capable” profile=“libvirtd” pid=16394 comm="rpc-w>

Apr 17 20:17:42 prod.fusco.me kernel: br0: port 2(vnet2) entered disabled state

Apr 17 20:17:42 prod.fusco.me kernel: device vnet2 left promiscuous mode

Apr 17 20:17:42 prod.fusco.me kernel: br0: port 2(vnet2) entered disabled state

Apr 17 20:17:43 prod.fusco.me kernel: audit: type=1400 audit(1681777063.029:77): apparmor=“STATUS” operation=“profile_remove” profile=“unconfined” name="libvir>

Apr 17 20:17:51 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name ‘12_casey’

Apr 17 20:17:51 prod.fusco.me kernel: audit: type=1400 audit(1681777071.782:78): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name="libvirt->

Apr 17 20:17:51 prod.fusco.me kernel: audit: type=1400 audit(1681777071.898:79): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:17:52 prod.fusco.me kernel: audit: type=1400 audit(1681777072.010:80): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:17:52 prod.fusco.me kernel: audit: type=1400 audit(1681777072.126:81): apparmor=“STATUS” operation=“profile_replace” info="same as current profile, s>

Apr 17 20:17:52 prod.fusco.me kernel: audit: type=1400 audit(1681777072.294:82): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:17:52 prod.fusco.me kernel: br0: port 2(vnet3) entered blocking state

Apr 17 20:17:52 prod.fusco.me kernel: br0: port 2(vnet3) entered disabled state

Apr 17 20:17:52 prod.fusco.me kernel: device vnet3 entered promiscuous mode

Apr 17 20:17:52 prod.fusco.me kernel: br0: port 2(vnet3) entered blocking state

Apr 17 20:17:52 prod.fusco.me kernel: br0: port 2(vnet3) entered listening state

Apr 17 20:17:52 prod.fusco.me kernel: audit: type=1400 audit(1681777072.558:83): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:17:52 prod.fusco.me kernel: audit: type=1400 audit(1681777072.890:84): apparmor=“DENIED” operation=“capable” profile=“libvirtd” pid=16394 comm="rpc-w>

Apr 17 20:18:07 prod.fusco.me kernel: br0: port 2(vnet3) entered learning state

Apr 17 20:18:22 prod.fusco.me kernel: br0: port 2(vnet3) entered forwarding state

Apr 17 20:18:22 prod.fusco.me kernel: br0: topology change detected, propagating

Apr 17 20:18:23 prod.fusco.me kernel: br0: port 2(vnet3) entered disabled state

Apr 17 20:18:23 prod.fusco.me kernel: device vnet3 left promiscuous mode

Apr 17 20:18:23 prod.fusco.me kernel: br0: port 2(vnet3) entered disabled state

Apr 17 20:18:24 prod.fusco.me kernel: audit: type=1400 audit(1681777104.122:85): apparmor=“STATUS” operation=“profile_remove” profile=“unconfined” name="libvir>

Apr 17 20:18:24 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name ‘12_casey’

Apr 17 20:18:25 prod.fusco.me kernel: audit: type=1400 audit(1681777105.639:86): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name="libvirt->

Apr 17 20:18:25 prod.fusco.me kernel: audit: type=1400 audit(1681777105.759:87): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:18:25 prod.fusco.me kernel: audit: type=1400 audit(1681777105.871:88): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:18:26 prod.fusco.me kernel: audit: type=1400 audit(1681777105.991:89): apparmor=“STATUS” operation=“profile_replace” info="same as current profile, s>

Apr 17 20:18:26 prod.fusco.me kernel: audit: type=1400 audit(1681777106.171:90): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:18:26 prod.fusco.me kernel: br0: port 2(vnet4) entered blocking state

Apr 17 20:18:26 prod.fusco.me kernel: br0: port 2(vnet4) entered disabled state

Apr 17 20:18:26 prod.fusco.me kernel: device vnet4 entered promiscuous mode

Apr 17 20:18:26 prod.fusco.me kernel: br0: port 2(vnet4) entered blocking state

Apr 17 20:18:26 prod.fusco.me kernel: br0: port 2(vnet4) entered listening state

Apr 17 20:18:26 prod.fusco.me kernel: audit: type=1400 audit(1681777106.771:94): apparmor=“DENIED” operation=“capable” profile=“libvirtd” pid=16394 comm="rpc-w>

Apr 17 20:18:41 prod.fusco.me kernel: br0: port 2(vnet4) entered learning state

Apr 17 20:18:43 prod.fusco.me kernel: br0: port 2(vnet4) entered disabled state

Apr 17 20:18:43 prod.fusco.me kernel: device vnet4 left promiscuous mode

Apr 17 20:18:43 prod.fusco.me kernel: br0: port 2(vnet4) entered disabled state

Apr 17 20:18:44 prod.fusco.me kernel: audit: type=1400 audit(1681777124.027:95): apparmor=“STATUS” operation=“profile_remove” profile=“unconfined” name="libvir>

Apr 17 20:18:45 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name ‘12_casey’

Apr 17 20:18:46 prod.fusco.me kernel: audit: type=1400 audit(1681777126.615:96): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name="libvirt->

Apr 17 20:18:46 prod.fusco.me kernel: audit: type=1400 audit(1681777126.739:97): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:18:46 prod.fusco.me kernel: audit: type=1400 audit(1681777126.843:98): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libvi>

Apr 17 20:18:47 prod.fusco.me kernel: audit: type=1400 audit(1681777126.975:99): apparmor=“STATUS” operation=“profile_replace” info="same as current profile, s>

Apr 17 20:18:47 prod.fusco.me kernel: audit: type=1400 audit(1681777127.143:100): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libv>

Apr 17 20:18:47 prod.fusco.me kernel: br0: port 2(vnet5) entered blocking state

Apr 17 20:18:47 prod.fusco.me kernel: br0: port 2(vnet5) entered disabled state

Apr 17 20:18:47 prod.fusco.me kernel: device vnet5 entered promiscuous mode

Apr 17 20:18:47 prod.fusco.me kernel: br0: port 2(vnet5) entered blocking state

Apr 17 20:18:47 prod.fusco.me kernel: br0: port 2(vnet5) entered listening state

Apr 17 20:18:47 prod.fusco.me kernel: audit: type=1400 audit(1681777127.387:101): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libv>

Apr 17 20:19:02 prod.fusco.me kernel: br0: port 2(vnet5) entered learning state

Apr 17 20:19:09 prod.fusco.me kernel: br0: port 2(vnet5) entered disabled state

Apr 17 20:19:09 prod.fusco.me kernel: device vnet5 left promiscuous mode

Apr 17 20:19:09 prod.fusco.me kernel: br0: port 2(vnet5) entered disabled state

Apr 17 20:19:10 prod.fusco.me kernel: audit: type=1400 audit(1681777150.100:102): apparmor=“STATUS” operation=“profile_remove” profile=“unconfined” name="libvi>

Apr 17 20:19:12 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name ‘12_casey’

Apr 17 20:19:12 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name ‘12_casey’

Apr 17 20:19:13 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name ‘12_casey’

Apr 17 20:19:14 prod.fusco.me kernel: audit: type=1400 audit(1681777154.436:103): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name="libvirt>

Apr 17 20:19:14 prod.fusco.me kernel: audit: type=1400 audit(1681777154.568:104): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libv>

Apr 17 20:19:14 prod.fusco.me kernel: audit: type=1400 audit(1681777154.684:105): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libv>

Apr 17 20:19:14 prod.fusco.me kernel: audit: type=1400 audit(1681777154.800:106): apparmor=“STATUS” operation=“profile_replace” info="same as current profile, >

Apr 17 20:19:15 prod.fusco.me kernel: audit: type=1400 audit(1681777154.976:107): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libv>

Apr 17 20:19:15 prod.fusco.me kernel: br0: port 2(vnet6) entered blocking state

Apr 17 20:19:15 prod.fusco.me kernel: br0: port 2(vnet6) entered disabled state

Apr 17 20:19:15 prod.fusco.me kernel: device vnet6 entered promiscuous mode

Apr 17 20:19:15 prod.fusco.me kernel: br0: port 2(vnet6) entered blocking state

Apr 17 20:19:15 prod.fusco.me kernel: br0: port 2(vnet6) entered listening state

Apr 17 20:19:15 prod.fusco.me kernel: audit: type=1400 audit(1681777155.180:108): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libv>

Apr 17 20:19:30 prod.fusco.me kernel: br0: port 2(vnet6) entered learning state

Apr 17 20:19:45 prod.fusco.me kernel: br0: port 2(vnet6) entered forwarding state

Apr 17 20:19:45 prod.fusco.me kernel: br0: topology change detected, propagating

Apr 17 20:20:02 prod.fusco.me kernel: br0: port 2(vnet6) entered disabled state

Apr 17 20:20:02 prod.fusco.me kernel: device vnet6 left promiscuous mode

Apr 17 20:20:02 prod.fusco.me kernel: br0: port 2(vnet6) entered disabled state

Apr 17 20:20:03 prod.fusco.me kernel: audit: type=1400 audit(1681777203.273:109): apparmor=“STATUS” operation=“profile_remove” profile=“unconfined” name="libvi>

Apr 17 20:20:09 prod.fusco.me middlewared[7268]: libvirt: QEMU Driver error : Domain not found: no domain with matching name ‘12_casey’

Apr 17 20:20:10 prod.fusco.me kernel: audit: type=1400 audit(1681777210.958:110): apparmor=“STATUS” operation=“profile_load” profile=“unconfined” name="libvirt>

Apr 17 20:20:11 prod.fusco.me kernel: audit: type=1400 audit(1681777211.078:111): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libv>

Apr 17 20:20:11 prod.fusco.me kernel: audit: type=1400 audit(1681777211.194:112): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libv>

Apr 17 20:20:11 prod.fusco.me kernel: audit: type=1400 audit(1681777211.306:113): apparmor=“STATUS” operation=“profile_replace” info="same as current profile, >

Apr 17 20:20:11 prod.fusco.me kernel: audit: type=1400 audit(1681777211.478:114): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libv>

Apr 17 20:20:11 prod.fusco.me kernel: br0: port 2(vnet7) entered blocking state

Apr 17 20:20:11 prod.fusco.me kernel: br0: port 2(vnet7) entered disabled state

Apr 17 20:20:11 prod.fusco.me kernel: device vnet7 entered promiscuous mode

Apr 17 20:20:11 prod.fusco.me kernel: br0: port 2(vnet7) entered blocking state

Apr 17 20:20:11 prod.fusco.me kernel: br0: port 2(vnet7) entered listening state

Apr 17 20:20:11 prod.fusco.me kernel: audit: type=1400 audit(1681777211.698:115): apparmor=“STATUS” operation=“profile_replace” profile=“unconfined” name="libv>

Apr 17 20:20:12 prod.fusco.me kernel: audit: type=1400 audit(1681777212.038:116): apparmor=“DENIED” operation=“capable” profile=“libvirtd” pid=16394 comm="rpc->

Apr 17 20:20:21 prod.fusco.me kernel: br0: port 2(vnet7) entered disabled state

Apr 17 20:20:21 prod.fusco.me kernel: device vnet7 left promiscuous mode

Apr 17 20:20:21 prod.fusco.me kernel: br0: port 2(vnet7) entered disabled state

Apr 17 20:20:21 prod.fusco.me kernel: audit: type=1400 audit(1681777221.554:117): apparmor=“STATUS” operation=“profile_remove” profile=“unconfined” name="libvi>

Apr 17 20:22:35 prod.fusco.me kernel: WARNING: Pool ‘sadness’ has encountered an uncorrectable I/O failure and has been suspended.

At 20:22 is the first sign that there was a problem, and just there after is when I tried to remove the bad disk. But then there’s nothing in the log until 20:30, when the server had started coming back up after it froze.

Any help is appreciated :slight_smile:

Specs of my server are:

NewProd Server | SCALE 22.12 RC1

| Supermicro H12SSL-I | EPYC 7282 | 256GB DDR4-3200 | 2X LSI 9500-8e to 2X EMC 15-Bay Shelf | Intel X710-DA2 | 28x10TB Drives in 2 Way mirrors | 4x Samsung PM9A1 512GB 2-Way Mirrored SPECIAL | 2x Optane 905p 960GB Mirrored

The shelves are connected to the two HBAs in an “X” configuration, if that’s helpful.

Hmmm. So I’m beginning to think the problem is one of the controllers inside of one of the shelves. Since mulitpathd doesn’t exist in SCALE (at least not in the home version), I think I’m screwed and the system got all sorts a corrupted.

There are 28 disks in my shelves. Each disk is a mirror of the one in the same slot in the other shelf.

Which looks like this:

Right now I have the shelves plugged into the NAS like this:

NAS Side:

Shelf Side:

When I have one SAS cable going to controller A in each of the disk shelves I get this, which is incorrect. And the UI says it sees 26 unassigned disks. Which is incorrect. In addition to the 28, I should see the 4 SPECIAL metadata devices and 2 L2ARC devices, plus I had inserted another drive as a hot spare (WHICH I FORGOT ABOUT), so I should have a total of 35

1681782770607.png

If I move the cables on the disk shelf side, to go to the other controller, I actually do see 35.

1681783509224.png

But the pool still won’t import :frowning:

This is why we can’t have nice things.

RAID is not a backup, neither is ZFS! I hope you have backups as you’d probably need to rebuild this system I’m afraid.

You can, if you keep to the KISS principle.

Not able to help I’m afraid as I don’t run TrueNAS (or any ZFS based stuff).

None of the data is irreplacable. It’s just 70TB of data. I know @wendell was able to mount a dirty zfs pool for Linus, but I am not sure how that’s done. If I could take some of the data off locally I’d be happier than having to pull it all down from my peering.

What’s the full output of lsblk from the cli ?

1 Like
root@prod[/var/log]# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda           8:0    0   9.1T  0 disk
sdb           8:16   0   9.1T  0 disk
├─sdb1        8:17   0     2G  0 part
└─sdb2        8:18   0   9.1T  0 part
sdc           8:32   0   9.1T  0 disk
├─sdc1        8:33   0     2G  0 part
└─sdc2        8:34   0   9.1T  0 part
sdd           8:48   0   9.1T  0 disk
├─sdd1        8:49   0     2G  0 part
└─sdd2        8:50   0   9.1T  0 part
sde           8:64   0   9.1T  0 disk
├─sde1        8:65   0     2G  0 part
└─sde2        8:66   0   9.1T  0 part
sdf           8:80   0   9.1T  0 disk
├─sdf1        8:81   0     2G  0 part
└─sdf2        8:82   0   9.1T  0 part
sdg           8:96   0   9.1T  0 disk
├─sdg1        8:97   0     2G  0 part
└─sdg2        8:98   0   9.1T  0 part
sdh           8:112  0   9.1T  0 disk
├─sdh1        8:113  0     2G  0 part
└─sdh2        8:114  0   9.1T  0 part
sdi           8:128  0   9.1T  0 disk
├─sdi1        8:129  0     2G  0 part
└─sdi2        8:130  0   9.1T  0 part
sdj           8:144  0   9.1T  0 disk
├─sdj1        8:145  0     2G  0 part
└─sdj2        8:146  0   9.1T  0 part
sdk           8:160  0   9.1T  0 disk
├─sdk1        8:161  0     2G  0 part
└─sdk2        8:162  0   9.1T  0 part
sdl           8:176  0   9.1T  0 disk
├─sdl1        8:177  0     2G  0 part
└─sdl2        8:178  0   9.1T  0 part
sdm           8:192  0   9.1T  0 disk
├─sdm1        8:193  0     2G  0 part
└─sdm2        8:194  0   9.1T  0 part
sdn           8:208  0   9.1T  0 disk
├─sdn1        8:209  0     2G  0 part
└─sdn2        8:210  0   9.1T  0 part
sdo           8:224  0   9.1T  0 disk
├─sdo1        8:225  0     2G  0 part
└─sdo2        8:226  0   9.1T  0 part
sdp           8:240  0   9.1T  0 disk
├─sdp1        8:241  0     2G  0 part
└─sdp2        8:242  0   9.1T  0 part
sdq          65:0    0   9.1T  0 disk
├─sdq1       65:1    0     2G  0 part
└─sdq2       65:2    0   9.1T  0 part
sdr          65:16   0   9.1T  0 disk
├─sdr1       65:17   0     2G  0 part
└─sdr2       65:18   0   9.1T  0 part
sds          65:32   0   9.1T  0 disk
├─sds1       65:33   0     2G  0 part
└─sds2       65:34   0   9.1T  0 part
sdt          65:48   0   9.1T  0 disk
├─sdt1       65:49   0     2G  0 part
└─sdt2       65:50   0   9.1T  0 part
sdu          65:64   0 111.8G  0 disk
├─sdu1       65:65   0     1M  0 part
├─sdu2       65:66   0   512M  0 part
├─sdu3       65:67   0  95.3G  0 part
└─sdu4       65:68   0    16G  0 part
sdv          65:80   0   9.1T  0 disk
├─sdv1       65:81   0     2G  0 part
└─sdv2       65:82   0   9.1T  0 part
sdw          65:96   0   9.1T  0 disk
├─sdw1       65:97   0     2G  0 part
└─sdw2       65:98   0   9.1T  0 part
sdx          65:112  0   9.1T  0 disk
├─sdx1       65:113  0     2G  0 part
└─sdx2       65:114  0   9.1T  0 part
sdy          65:128  0   9.1T  0 disk
├─sdy1       65:129  0     2G  0 part
└─sdy2       65:130  0   9.1T  0 part
sdz          65:144  0   9.1T  0 disk
├─sdz1       65:145  0     2G  0 part
└─sdz2       65:146  0   9.1T  0 part
sdaa         65:160  0   9.1T  0 disk
├─sdaa1      65:161  0     2G  0 part
└─sdaa2      65:162  0   9.1T  0 part
sdab         65:176  0   9.1T  0 disk
├─sdab1      65:177  0     2G  0 part
└─sdab2      65:178  0   9.1T  0 part
sdac         65:192  0   9.1T  0 disk
├─sdac1      65:193  0     2G  0 part
└─sdac2      65:194  0   9.1T  0 part
sdad         65:208  0   9.1T  0 disk
├─sdad1      65:209  0     2G  0 part
└─sdad2      65:210  0   9.1T  0 part
zd0         230:0    0   120G  0 disk
zd16        230:16   0   120G  0 disk
zd32        230:32   0   500G  0 disk
zd48        230:48   0   200G  0 disk
zd64        230:64   0   100G  0 disk
zd80        230:80   0   100G  0 disk
zd96        230:96   0   120G  0 disk
nvme4n1     259:0    0 110.3G  0 disk
└─nvme4n1p1 259:2    0 110.3G  0 part
nvme5n1     259:1    0 110.3G  0 disk
└─nvme5n1p1 259:3    0 110.3G  0 part
nvme6n1     259:4    0 894.3G  0 disk
├─nvme6n1p1 259:5    0     2G  0 part
│ └─md127     9:127  0     2G  0 raid1
│   └─md127 253:0    0     2G  0 crypt [SWAP]
└─nvme6n1p2 259:6    0 892.3G  0 part
nvme0n1     259:7    0 476.9G  0 disk
└─nvme0n1p1 259:8    0 476.9G  0 part
nvme3n1     259:9    0 476.9G  0 disk
└─nvme3n1p1 259:16   0 476.9G  0 part
nvme7n1     259:10   0 894.3G  0 disk
├─nvme7n1p1 259:12   0     2G  0 part
│ └─md127     9:127  0     2G  0 raid1
│   └─md127 253:0    0     2G  0 crypt [SWAP]
└─nvme7n1p2 259:13   0 892.3G  0 part
nvme2n1     259:11   0 476.9G  0 disk
└─nvme2n1p1 259:14   0 476.9G  0 part
nvme1n1     259:15   0 476.9G  0 disk
└─nvme1n1p1 259:17   0 476.9G  0 part

Output from zdb -e

 zdb -e sadness

Configuration for import:
        vdev_children: 21
        version: 5000
        pool_guid: 9977369563076415635
        name: 'sadness'
        state: 0
        hostid: 2042556907
        hostname: 'prod'
        vdev_tree:
            type: 'root'
            id: 0
            guid: 9977369563076415635
            children[0]:
                type: 'mirror'
                id: 0
                guid: 12254861329070289191
                metaslab_array: 183
                metaslab_shift: 34
                ashift: 12
                asize: 9998678884352
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 5086712831611402237
                    whole_disk: 0
                    DTL: 4836
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/255f91c5-6fd8-4d11-bfe1-bb0b0995bde1'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 15943223402894770756
                    whole_disk: 0
                    DTL: 4834
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/2db30682-bb8d-44b4-8279-960e7071ed66'
            children[1]:
                type: 'mirror'
                id: 1
                guid: 12066259498466103666
                metaslab_array: 182
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 4550735796668586180
                    whole_disk: 0
                    DTL: 4848
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/e3fbe854-0307-473e-9f39-37a84d4747d1'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 6366035862544255253
                    whole_disk: 0
                    DTL: 4847
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/49e58faf-2b18-43b6-bd50-29ef9c9bc30f'
            children[2]:
                type: 'mirror'
                id: 2
                guid: 14985023760802459005
                metaslab_array: 181
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 8946863842827945649
                    whole_disk: 0
                    DTL: 4841
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/480d7ade-f786-4511-bb76-6e7c0b64ab48'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 4700051863249598752
                    whole_disk: 0
                    DTL: 4840
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/a6b0d83f-4413-45af-91bb-f26a27c56165'
            children[3]:
                type: 'mirror'
                id: 3
                guid: 4976704069116612581
                metaslab_array: 180
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 15502149421824249219
                    whole_disk: 0
                    DTL: 4831
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/10ebed85-ab73-472c-b556-c25c14afd966'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 13866702663057467586
                    whole_disk: 0
                    DTL: 4830
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/a299b22e-e339-4e48-8c5b-a980a4057237'
            children[4]:
                type: 'mirror'
                id: 4
                guid: 13951913235312177868
                metaslab_array: 179
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 5691914841095922244
                    whole_disk: 0
                    DTL: 4839
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/762e8aa7-1be0-4e75-b297-f53161ecb047'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 11916929020424420111
                    whole_disk: 0
                    DTL: 4837
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/a1f3d1eb-55e0-4e4a-8015-20c372c3001a'
            children[5]:
                type: 'mirror'
                id: 5
                guid: 15816290078836845158
                metaslab_array: 178
                metaslab_shift: 34
                ashift: 12
                asize: 9998678884352
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 1790559324010269292
                    whole_disk: 0
                    DTL: 4850
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/d2aef666-ff6a-4d4a-9442-cd70f409f43c'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 6360281643637752359
                    whole_disk: 0
                    DTL: 4849
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/8d399a3a-7ecc-496f-bfbd-6ae48a2f89ee'
            children[6]:
                type: 'mirror'
                id: 6
                guid: 8591766545015033896
                metaslab_array: 177
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 16275786482488365167
                    whole_disk: 0
                    DTL: 4833
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/4b60c0ba-f4cf-477a-b230-cb8c4e310112'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 17197248781955331245
                    whole_disk: 0
                    DTL: 4832
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/749b9f6f-c208-4900-b210-e623146c830f'
            children[7]:
                type: 'mirror'
                id: 7
                guid: 7468040530148448787
                metaslab_array: 176
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 16019146947386432102
                    whole_disk: 0
                    DTL: 4844
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/e44f5a4c-6463-40a2-8042-d0b9dea3a4c5'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 789684125560567178
                    whole_disk: 0
                    DTL: 4842
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/ce167dd2-9f11-4bf8-9ccb-e86042d4aa11'
            children[8]:
                type: 'mirror'
                id: 8
                guid: 2962212927076889190
                metaslab_array: 175
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 4
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 13279394592555200764
                    whole_disk: 0
                    DTL: 4846
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/ca2fed1e-edd8-4f91-9126-a9a2f667dc34'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 6383596830989093462
                    whole_disk: 0
                    DTL: 4845
                    create_txg: 4
                    path: '/dev/disk/by-partuuid/34cfa66f-66c5-4bf1-a084-7d018f18efdd'
            children[9]:
                type: 'missing'
                id: 9
                guid: 0
            children[10]:
                type: 'missing'
                id: 10
                guid: 0
            children[11]:
                type: 'mirror'
                id: 11
                guid: 13730618831942771340
                metaslab_array: 1284
                metaslab_shift: 32
                ashift: 12
                asize: 512105381888
                is_log: 0
                create_txg: 82
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 14731919620377538683
                    whole_disk: 0
                    DTL: 4827
                    create_txg: 82
                    path: '/dev/disk/by-partuuid/63de864e-c4c4-41c0-b495-1bd1fd723c64'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 2033729016575030711
                    whole_disk: 0
                    DTL: 4826
                    create_txg: 82
                    path: '/dev/disk/by-partuuid/be807c16-c6d9-417e-a5b9-19b6af5ec837'
            children[12]:
                type: 'mirror'
                id: 12
                guid: 5809140082036246482
                metaslab_array: 1419
                metaslab_shift: 32
                ashift: 12
                asize: 512105381888
                is_log: 0
                create_txg: 92
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 17493500226162294994
                    whole_disk: 0
                    DTL: 4829
                    create_txg: 92
                    path: '/dev/disk/by-partuuid/cad9688f-85b8-41f2-8f89-5a66c67789a7'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 6073052702381961461
                    whole_disk: 0
                    DTL: 4828
                    create_txg: 92
                    path: '/dev/disk/by-partuuid/323a1964-db36-40c2-be4f-93bc2cb24843'
            children[13]:
                type: 'missing'
                id: 13
                guid: 0
            children[14]:
                type: 'missing'
                id: 14
                guid: 0
            children[15]:
                type: 'mirror'
                id: 15
                guid: 15140286079497109367
                metaslab_array: 18893
                metaslab_shift: 34
                ashift: 12
                asize: 9998678884352
                is_log: 0
                create_txg: 758879
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 17831527895260049240
                    whole_disk: 0
                    DTL: 73960
                    create_txg: 758879
                    path: '/dev/disk/by-partuuid/6d9d3acf-94b3-4819-a78c-1c23b53212a2'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 4275542926759592415
                    whole_disk: 0
                    DTL: 73959
                    create_txg: 758879
                    path: '/dev/disk/by-partuuid/b976ef3b-a7eb-4347-9d36-245f738098be'
            children[16]:
                type: 'mirror'
                id: 16
                guid: 5296809474692138764
                metaslab_array: 19904
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 759189
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 1596451084264006543
                    whole_disk: 0
                    DTL: 73963
                    create_txg: 759189
                    path: '/dev/disk/by-partuuid/430faa5c-a3f4-44fd-8e99-5db535f146d6'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 11509495317434492829
                    whole_disk: 0
                    DTL: 73961
                    create_txg: 759189
                    path: '/dev/disk/by-partuuid/a81914dd-31c0-4e83-a369-cd4568484c42'
            children[17]:
                type: 'mirror'
                id: 17
                guid: 10107206358176273262
                metaslab_array: 97115
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 1674631
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 16179806153641235865
                    whole_disk: 0
                    DTL: 112681
                    create_txg: 1674631
                    path: '/dev/disk/by-partuuid/c2640dc1-ecde-4638-8937-169b740b88aa'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 6519077389205892531
                    whole_disk: 0
                    DTL: 112680
                    create_txg: 1674631
                    path: '/dev/disk/by-partuuid/379daf79-69ac-4968-9a27-5a6b503bbcc4'
            children[18]:
                type: 'mirror'
                id: 18
                guid: 4971576989779035714
                metaslab_array: 22189
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 1724558
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 9307995113113626143
                    whole_disk: 0
                    DTL: 115427
                    create_txg: 1724558
                    path: '/dev/disk/by-partuuid/822db76c-4def-4ead-9a8c-5b1175a49be8'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 17260282938147507352
                    whole_disk: 0
                    DTL: 115426
                    create_txg: 1724558
                    path: '/dev/disk/by-partuuid/1dd58ac2-e8b8-4afd-a358-01d5e69bd07e'
            children[19]:
                type: 'missing'
                id: 19
                guid: 0
            children[20]:
                type: 'spare'
                id: 20
                guid: 6204868430839235276
                whole_disk: 0
                metaslab_array: 78756
                metaslab_shift: 34
                ashift: 12
                asize: 9998678360064
                is_log: 0
                create_txg: 1949674
                children[0]:
                    type: 'disk'
                    id: 0
                    guid: 17540057732824797402
                    whole_disk: 0
                    DTL: 84654
                    create_txg: 1949674
                    degraded: 1
                    aux_state: 'err_exceeded'
                    path: '/dev/disk/by-partuuid/e1da746c-2b0a-4297-bb4b-a30088cec248'
                children[1]:
                    type: 'disk'
                    id: 1
                    guid: 13181857595583535955
                    whole_disk: 0
                    is_spare: 1
                    DTL: 82025
                    create_txg: 1949674
                    path: '/dev/disk/by-partuuid/7793a8b5-da95-4d28-893e-fdf468afdc1c'
        load-policy:
            load-request-txg: 18446744073709551615
            load-rewind-policy: 2
zdb: can't open 'sadness': No such file or directory

ZFS_DBGMSG(zdb) START:
spa.c:6107:spa_import(): spa_import: importing sadness
spa_misc.c:418:spa_load_note(): spa_load(sadness, config trusted): LOADING
vdev.c:160:vdev_dbgmsg(): disk vdev '/dev/disk/by-partuuid/480d7ade-f786-4511-bb76-6e7c0b64ab48': best uberblock found for spa sadness. txg 1967384
spa_misc.c:418:spa_load_note(): spa_load(sadness, config untrusted): using uberblock with txg=1967384
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 5086712831611402237: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:14:0/0' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 15943223402894770756: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/0' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 4550735796668586180: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/1' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 6366035862544255253: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:19:0/1' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 8946863842827945649: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/2' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 4700051863249598752: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:19:0/2' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 15502149421824249219: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:19:0/3' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 13866702663057467586: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/3' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 5691914841095922244: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/9' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 11916929020424420111: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/4' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 1790559324010269292: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/5' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 6360281643637752359: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:14:0/5' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 16275786482488365167: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:14:0/6' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 17197248781955331245: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/6' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 16019146947386432102: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/4' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 789684125560567178: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/8' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 13279394592555200764: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/7' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 6383596830989093462: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/7' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 2033729016575030711: vdev_enc_sysfs_path changed from '/sys/bus/pci/slots/0' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 17831527895260049240: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:14:0/9' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 4275542926759592415: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/10' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 1596451084264006543: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/8' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 11509495317434492829: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:14:0/10' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 16179806153641235865: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/11' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 6519077389205892531: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/11' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 9307995113113626143: vdev_enc_sysfs_path changed from '/sys/class/enclosure/3:0:28:0/13' to '(null)'
vdev.c:2430:vdev_copy_path_impl(): vdev_copy_path: vdev 17260282938147507352: vdev_enc_sysfs_path changed from '/sys/class/enclosure/13:0:9:0/13' to '(null)'
spa_misc.c:418:spa_load_note(): spa_load(sadness, config trusted): vdev tree has 1 missing top-level vdevs.
spa_misc.c:418:spa_load_note(): spa_load(sadness, config trusted): current settings allow for maximum 0 missing top-level vdevs at this stage.
spa_misc.c:403:spa_load_failed(): spa_load(sadness, config trusted): FAILED: unable to open vdev tree [error=2]
vdev.c:212:vdev_dbgmsg_print_tree():   vdev 0: root, guid: 9977369563076415635, path: N/A, can't open
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 0: mirror, guid: 12254861329070289191, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 5086712831611402237, path: /dev/disk/by-partuuid/255f91c5-6fd8-4d11-bfe1-bb0b0995bde1, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 15943223402894770756, path: /dev/disk/by-partuuid/2db30682-bb8d-44b4-8279-960e7071ed66, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 1: mirror, guid: 12066259498466103666, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 4550735796668586180, path: /dev/disk/by-partuuid/e3fbe854-0307-473e-9f39-37a84d4747d1, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 6366035862544255253, path: /dev/disk/by-partuuid/49e58faf-2b18-43b6-bd50-29ef9c9bc30f, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 2: mirror, guid: 14985023760802459005, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 8946863842827945649, path: /dev/disk/by-partuuid/480d7ade-f786-4511-bb76-6e7c0b64ab48, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 4700051863249598752, path: /dev/disk/by-partuuid/a6b0d83f-4413-45af-91bb-f26a27c56165, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 3: mirror, guid: 4976704069116612581, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 15502149421824249219, path: /dev/disk/by-partuuid/10ebed85-ab73-472c-b556-c25c14afd966, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 13866702663057467586, path: /dev/disk/by-partuuid/a299b22e-e339-4e48-8c5b-a980a4057237, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 4: mirror, guid: 13951913235312177868, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 5691914841095922244, path: /dev/disk/by-partuuid/762e8aa7-1be0-4e75-b297-f53161ecb047, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 11916929020424420111, path: /dev/disk/by-partuuid/a1f3d1eb-55e0-4e4a-8015-20c372c3001a, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 5: mirror, guid: 15816290078836845158, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 1790559324010269292, path: /dev/disk/by-partuuid/d2aef666-ff6a-4d4a-9442-cd70f409f43c, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 6360281643637752359, path: /dev/disk/by-partuuid/8d399a3a-7ecc-496f-bfbd-6ae48a2f89ee, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 6: mirror, guid: 8591766545015033896, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 16275786482488365167, path: /dev/disk/by-partuuid/4b60c0ba-f4cf-477a-b230-cb8c4e310112, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 17197248781955331245, path: /dev/disk/by-partuuid/749b9f6f-c208-4900-b210-e623146c830f, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 7: mirror, guid: 7468040530148448787, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 16019146947386432102, path: /dev/disk/by-partuuid/e44f5a4c-6463-40a2-8042-d0b9dea3a4c5, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 789684125560567178, path: /dev/disk/by-partuuid/ce167dd2-9f11-4bf8-9ccb-e86042d4aa11, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 8: mirror, guid: 2962212927076889190, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 13279394592555200764, path: /dev/disk/by-partuuid/ca2fed1e-edd8-4f91-9126-a9a2f667dc34, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 6383596830989093462, path: /dev/disk/by-partuuid/34cfa66f-66c5-4bf1-a084-7d018f18efdd, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 9: indirect, guid: 9332782597973287530, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 10: indirect, guid: 4218078770841086833, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 11: mirror, guid: 13730618831942771340, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 14731919620377538683, path: /dev/disk/by-partuuid/63de864e-c4c4-41c0-b495-1bd1fd723c64, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 2033729016575030711, path: /dev/disk/by-partuuid/be807c16-c6d9-417e-a5b9-19b6af5ec837, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 12: mirror, guid: 5809140082036246482, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 17493500226162294994, path: /dev/disk/by-partuuid/cad9688f-85b8-41f2-8f89-5a66c67789a7, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 6073052702381961461, path: /dev/disk/by-partuuid/323a1964-db36-40c2-be4f-93bc2cb24843, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 13: indirect, guid: 235302787419978197, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 14: indirect, guid: 1381446463215791984, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 15: mirror, guid: 15140286079497109367, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 17831527895260049240, path: /dev/disk/by-partuuid/6d9d3acf-94b3-4819-a78c-1c23b53212a2, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 4275542926759592415, path: /dev/disk/by-partuuid/b976ef3b-a7eb-4347-9d36-245f738098be, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 16: mirror, guid: 5296809474692138764, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 1596451084264006543, path: /dev/disk/by-partuuid/430faa5c-a3f4-44fd-8e99-5db535f146d6, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 11509495317434492829, path: /dev/disk/by-partuuid/a81914dd-31c0-4e83-a369-cd4568484c42, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 17: mirror, guid: 10107206358176273262, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 16179806153641235865, path: /dev/disk/by-partuuid/c2640dc1-ecde-4638-8937-169b740b88aa, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 6519077389205892531, path: /dev/disk/by-partuuid/379daf79-69ac-4968-9a27-5a6b503bbcc4, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 18: mirror, guid: 4971576989779035714, path: N/A, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 9307995113113626143, path: /dev/disk/by-partuuid/822db76c-4def-4ead-9a8c-5b1175a49be8, healthy
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 17260282938147507352, path: /dev/disk/by-partuuid/1dd58ac2-e8b8-4afd-a358-01d5e69bd07e, healthy
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 19: mirror, guid: 7567707362911221306, path: N/A, can't open
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 10910638645217480881, path: /dev/disk/by-partuuid/9318f0f4-72fd-4ad1-8292-21e8a8d8b82c, can't open
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 17665183671171580697, path: /dev/disk/by-partuuid/e2f3d3c3-0033-4300-8c38-a7a56513f145, can't open
vdev.c:212:vdev_dbgmsg_print_tree():     vdev 20: spare, guid: 6204868430839235276, path: N/A, degraded
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 0: disk, guid: 17540057732824797402, path: /dev/disk/by-partuuid/e1da746c-2b0a-4297-bb4b-a30088cec248, degraded
vdev.c:212:vdev_dbgmsg_print_tree():       vdev 1: disk, guid: 13181857595583535955, path: /dev/disk/by-partuuid/7793a8b5-da95-4d28-893e-fdf468afdc1c, healthy
spa_misc.c:418:spa_load_note(): spa_load(sadness, config trusted): UNLOADING
ZFS_DBGMSG(zdb) END
root@prod[/var/log]#

so thats saying it wont import because there is one missing vdev.

this actually might be fine, but I dont want to force an import just yet.

the output there gives you the /dev/disk/by-uuid

ls that as well
make a spreadsheet and confirm that you have just one missing vdev pls. I’d do it via script but doing dayjob stuff rn

you could make a spreadsheet

FreeNAS does weird things when the failure contains the system dataset. (like the GUI becoming unresponsive) and that always seems to be the disk that goes.

just make sure after you get all set up again get the system dataset (it is always auto-created on the first built dataset) off of anything containing actual DATA.

I move mine to my OS drives (non-USB) this can have other concerns, but it is the lesse of the evils as far as i am concerned.

You are correct.

root@prod[/var/log]# zpool import
   pool: sadness
     id: 9977369563076415635
  state: UNAVAIL
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        sadness                                   UNAVAIL  insufficient replicas
          mirror-0                                ONLINE
            255f91c5-6fd8-4d11-bfe1-bb0b0995bde1  ONLINE
            2db30682-bb8d-44b4-8279-960e7071ed66  ONLINE
          mirror-1                                ONLINE
            e3fbe854-0307-473e-9f39-37a84d4747d1  ONLINE
            49e58faf-2b18-43b6-bd50-29ef9c9bc30f  ONLINE
          mirror-2                                ONLINE
            480d7ade-f786-4511-bb76-6e7c0b64ab48  ONLINE
            a6b0d83f-4413-45af-91bb-f26a27c56165  ONLINE
          mirror-3                                ONLINE
            10ebed85-ab73-472c-b556-c25c14afd966  ONLINE
            a299b22e-e339-4e48-8c5b-a980a4057237  ONLINE
          mirror-4                                ONLINE
            762e8aa7-1be0-4e75-b297-f53161ecb047  ONLINE
            a1f3d1eb-55e0-4e4a-8015-20c372c3001a  ONLINE
          mirror-5                                ONLINE
            d2aef666-ff6a-4d4a-9442-cd70f409f43c  ONLINE
            8d399a3a-7ecc-496f-bfbd-6ae48a2f89ee  ONLINE
          mirror-6                                ONLINE
            4b60c0ba-f4cf-477a-b230-cb8c4e310112  ONLINE
            749b9f6f-c208-4900-b210-e623146c830f  ONLINE
          mirror-7                                ONLINE
            e44f5a4c-6463-40a2-8042-d0b9dea3a4c5  ONLINE
            ce167dd2-9f11-4bf8-9ccb-e86042d4aa11  ONLINE
          mirror-8                                ONLINE
            ca2fed1e-edd8-4f91-9126-a9a2f667dc34  ONLINE
            34cfa66f-66c5-4bf1-a084-7d018f18efdd  ONLINE
          indirect-9                              ONLINE
          indirect-10                             ONLINE
          mirror-11                               ONLINE
            63de864e-c4c4-41c0-b495-1bd1fd723c64  ONLINE
            be807c16-c6d9-417e-a5b9-19b6af5ec837  ONLINE
          mirror-12                               ONLINE
            cad9688f-85b8-41f2-8f89-5a66c67789a7  ONLINE
            323a1964-db36-40c2-be4f-93bc2cb24843  ONLINE
          indirect-13                             ONLINE
          indirect-14                             ONLINE
          mirror-15                               ONLINE
            6d9d3acf-94b3-4819-a78c-1c23b53212a2  ONLINE
            b976ef3b-a7eb-4347-9d36-245f738098be  ONLINE
          mirror-16                               ONLINE
            430faa5c-a3f4-44fd-8e99-5db535f146d6  ONLINE
            a81914dd-31c0-4e83-a369-cd4568484c42  ONLINE
          mirror-17                               ONLINE
            c2640dc1-ecde-4638-8937-169b740b88aa  ONLINE
            379daf79-69ac-4968-9a27-5a6b503bbcc4  ONLINE
          mirror-18                               ONLINE
            822db76c-4def-4ead-9a8c-5b1175a49be8  ONLINE
            1dd58ac2-e8b8-4afd-a358-01d5e69bd07e  ONLINE
          mirror-19                               UNAVAIL  insufficient replicas
            9318f0f4-72fd-4ad1-8292-21e8a8d8b82c  UNAVAIL
            e2f3d3c3-0033-4300-8c38-a7a56513f145  UNAVAIL
          spare-20                                ONLINE
            e1da746c-2b0a-4297-bb4b-a30088cec248  ONLINE
            7793a8b5-da95-4d28-893e-fdf468afdc1c  ONLINE

I don’t know what all of the indirect things mean? But I told ZFS to ignore that one data vdev was missing

I ran:


echo 1 > /sys/module/zfs/parameters/zfs_max_missing_tvds

and was able to mount the pool as readonly

root@prod[/sys/module/zfs/parameters]# zpool import -o readonly=on sadness

I was able to browse through all of the directories and see files. The VDEV that failed was only recently added so if that data goes bye-bye I don’t really care at all.

But I can’t mount the pool in non-read only mode.

root@prod[~]# zpool import -mfR /mnt sadness
cannot import 'sadness': one or more devices is currently unavailable

This is saying this mirror pair unavailable. That’s sorta really bad

Yeah… I get that. But I don’t know why?

If I do


zpool import -FX /mnt sadness

Any chance it’ll run?

Look in /dev for those uuids and see if they are there. Then if not see if you can figure out which disk(s) they are in your shelf. You could plug them in not via shelf, at least one of them, probably?

Indirect sometimes is if you have done something in the past with changing the zpool geometry I need to try to remember about that and look at this properly… I’m out of pocket at the moment

No problem. I can try that. I know what disks they are.
I’m headed out for a couple of hours. Appreciate your responses.

I’m glad the read-only import works. zpool list -v sadness should also list the corrupted files you may not want to evacuate if you can’t solve it entirely.

If you can narrow down if it’s a hardware problem (disks dead or controller/cabling freaking out) or just software, that would be helpful.

You only need one disk of the missing mirror vdev back online and you’re fine.
So that’s the job. I doubt the data on the disk is actually gone or beyond corruption.

I wouldn’t rollback the TXG just yet. That’s a (desperate) measure of last resort and can’t be reverted once committed.

1 Like

I just took the three drives that were affected:

  • Pool sadness state is DEGRADED: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
    The following devices are not healthy:
    • Disk HUH721010AL4200 7PG33KKR is UNAVAIL
    • Disk HUH721010AL4200 7PG27ZZR is DEGRADED
    • Disk HUH721010AL4200 7PG3RYSR is DEGRADED

and plugged them into a 9207 with a sff 8087 to SAS breakout cable

root@prod[~]# zpool import
   pool: sadness
     id: 9977369563076415635
  state: UNAVAIL
status: One or more devices contains corrupted data.
 action: The pool cannot be imported due to damaged devices or data.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-5E
 config:

        sadness                                   UNAVAIL  insufficient replicas
          mirror-0                                ONLINE
            255f91c5-6fd8-4d11-bfe1-bb0b0995bde1  ONLINE
            2db30682-bb8d-44b4-8279-960e7071ed66  ONLINE
          mirror-1                                ONLINE
            e3fbe854-0307-473e-9f39-37a84d4747d1  ONLINE
            49e58faf-2b18-43b6-bd50-29ef9c9bc30f  ONLINE
          mirror-2                                ONLINE
            480d7ade-f786-4511-bb76-6e7c0b64ab48  ONLINE
            a6b0d83f-4413-45af-91bb-f26a27c56165  ONLINE
          mirror-3                                ONLINE
            10ebed85-ab73-472c-b556-c25c14afd966  ONLINE
            a299b22e-e339-4e48-8c5b-a980a4057237  ONLINE
          mirror-4                                ONLINE
            762e8aa7-1be0-4e75-b297-f53161ecb047  ONLINE
            a1f3d1eb-55e0-4e4a-8015-20c372c3001a  ONLINE
          mirror-5                                ONLINE
            d2aef666-ff6a-4d4a-9442-cd70f409f43c  ONLINE
            8d399a3a-7ecc-496f-bfbd-6ae48a2f89ee  ONLINE
          mirror-6                                ONLINE
            4b60c0ba-f4cf-477a-b230-cb8c4e310112  ONLINE
            749b9f6f-c208-4900-b210-e623146c830f  ONLINE
          mirror-7                                ONLINE
            e44f5a4c-6463-40a2-8042-d0b9dea3a4c5  ONLINE
            ce167dd2-9f11-4bf8-9ccb-e86042d4aa11  ONLINE
          mirror-8                                ONLINE
            ca2fed1e-edd8-4f91-9126-a9a2f667dc34  ONLINE
            34cfa66f-66c5-4bf1-a084-7d018f18efdd  ONLINE
          indirect-9                              ONLINE
          indirect-10                             ONLINE
          mirror-11                               ONLINE
            63de864e-c4c4-41c0-b495-1bd1fd723c64  ONLINE
            be807c16-c6d9-417e-a5b9-19b6af5ec837  ONLINE
          mirror-12                               ONLINE
            cad9688f-85b8-41f2-8f89-5a66c67789a7  ONLINE
            323a1964-db36-40c2-be4f-93bc2cb24843  ONLINE
          indirect-13                             ONLINE
          indirect-14                             ONLINE
          mirror-15                               ONLINE
            6d9d3acf-94b3-4819-a78c-1c23b53212a2  ONLINE
            b976ef3b-a7eb-4347-9d36-245f738098be  ONLINE
          mirror-16                               ONLINE
            430faa5c-a3f4-44fd-8e99-5db535f146d6  ONLINE
            a81914dd-31c0-4e83-a369-cd4568484c42  ONLINE
          mirror-17                               ONLINE
            c2640dc1-ecde-4638-8937-169b740b88aa  ONLINE
            379daf79-69ac-4968-9a27-5a6b503bbcc4  ONLINE
          mirror-18                               ONLINE
            822db76c-4def-4ead-9a8c-5b1175a49be8  ONLINE
            1dd58ac2-e8b8-4afd-a358-01d5e69bd07e  ONLINE
          mirror-19                               UNAVAIL  insufficient replicas
            9318f0f4-72fd-4ad1-8292-21e8a8d8b82c  UNAVAIL
            e2f3d3c3-0033-4300-8c38-a7a56513f145  UNAVAIL
          spare-20                                ONLINE
            e1da746c-2b0a-4297-bb4b-a30088cec248  ONLINE
            7793a8b5-da95-4d28-893e-fdf468afdc1c  ONLINE

Still says unavailable unfortunately.

How do I map a GUID in ZFS to an SDX device?

I don’t see any listed in /dev ?

EDIT: I figured it out

 blkid
/dev/sdb2: LABEL="sadness" UUID="9977369563076415635" UUID_SUB="13181857595583535955" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="**7793a8b5-da95-4d28-893e-fdf468afdc1c**"
/dev/sdc2: LABEL="sadness" UUID="9977369563076415635" UUID_SUB="17540057732824797402" BLOCK_SIZE="4096" TYPE="zfs_member" PARTUUID="**e1da746c-2b0a-4297-bb4b-a30088cec248**"

Which is diffant than expected:

          mirror-19                               UNAVAIL  insufficient replicas
            9318f0f4-72fd-4ad1-8292-21e8a8d8b82c  UNAVAIL
            e2f3d3c3-0033-4300-8c38-a7a56513f145  UNAVAIL

And are both marked as spares:

spare-20 ONLINE
e1da746c-2b0a-4297-bb4b-a30088cec248 ONLINE
7793a8b5-da95-4d28-893e-fdf468afdc1c ONLINE

The other one, which is serial number 7PG27ZZR and also sda is the one I had told the TrueNAS UI to remove.

Only one of the drives was ever marked as a spare…

So it seems, for some reason or another I have a problem whereby two disks were removed and one of them was marked by ZFS as a spare?

I’ve basically given up, seems I am not going to get much further. I’ve mounted the pool as read only and I am copying files over manually, and I will just reconcile the differances between my most recent backup and now and pull that back down from my off-site.

Thanks all for your help.

This is prolly best but zfs will totally tell you what files, if any, are actually damaged.

If zfs gives you a file, it’s what you gave it

1 Like

So I’ve gotten quote a few of my files moving off the pool. Based on my calculations, I’ve eaten through my cold spares and some other pools I have in my lab. I will have JUST ENOUGH space to get all of my data off if I can remove the two drives marked as spare in the broken pool.

Is that safe? If it is, how do I remove drives by the UUID? I’ve only ever done it by /dev/adax or whatever.

  pool: sadness
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: resilvered 9.09G in 00:01:55 with 23920 errors on Mon Apr 17 02:09:10 2023
remove: Removal of vdev 14 copied 3.06G in 0h0m, completed on Fri Dec 30 16:59:46 2022
        214K memory used for removed device mappings
config:

        NAME                                      STATE     READ WRITE CKSUM
        sadness                                   DEGRADED     0     0     0
          mirror-0                                ONLINE       0     0     0
            255f91c5-6fd8-4d11-bfe1-bb0b0995bde1  ONLINE       0     0     0
            2db30682-bb8d-44b4-8279-960e7071ed66  ONLINE       0     0     0
          mirror-1                                ONLINE       0     0     0
            e3fbe854-0307-473e-9f39-37a84d4747d1  ONLINE       0     0     0
            49e58faf-2b18-43b6-bd50-29ef9c9bc30f  ONLINE       0     0     0
          mirror-2                                ONLINE       0     0     0
            480d7ade-f786-4511-bb76-6e7c0b64ab48  ONLINE       0     0     0
            a6b0d83f-4413-45af-91bb-f26a27c56165  ONLINE       0     0     0
          mirror-3                                ONLINE       0     0     0
            10ebed85-ab73-472c-b556-c25c14afd966  ONLINE       0     0     0
            a299b22e-e339-4e48-8c5b-a980a4057237  ONLINE       0     0     0
          mirror-4                                ONLINE       0     0     0
            762e8aa7-1be0-4e75-b297-f53161ecb047  ONLINE       0     0     0
            a1f3d1eb-55e0-4e4a-8015-20c372c3001a  ONLINE       0     0     0
          mirror-5                                ONLINE       0     0     0
            d2aef666-ff6a-4d4a-9442-cd70f409f43c  ONLINE       0     0     0
            8d399a3a-7ecc-496f-bfbd-6ae48a2f89ee  ONLINE       0     0     0
          mirror-6                                ONLINE       0     0     0
            4b60c0ba-f4cf-477a-b230-cb8c4e310112  ONLINE       0     0     0
            749b9f6f-c208-4900-b210-e623146c830f  ONLINE       0     0     0
          mirror-7                                ONLINE       0     0     0
            e44f5a4c-6463-40a2-8042-d0b9dea3a4c5  ONLINE       0     0     0
            ce167dd2-9f11-4bf8-9ccb-e86042d4aa11  ONLINE       0     0     0
          mirror-8                                ONLINE       0     0     0
            ca2fed1e-edd8-4f91-9126-a9a2f667dc34  ONLINE       0     0     0
            34cfa66f-66c5-4bf1-a084-7d018f18efdd  ONLINE       0     0     0
          mirror-15                               ONLINE       0     0     0
            6d9d3acf-94b3-4819-a78c-1c23b53212a2  ONLINE       0     0     0
            b976ef3b-a7eb-4347-9d36-245f738098be  ONLINE       0     0     0
          mirror-16                               ONLINE       0     0     0
            430faa5c-a3f4-44fd-8e99-5db535f146d6  ONLINE       0     0     0
            a81914dd-31c0-4e83-a369-cd4568484c42  ONLINE       0     0     0
          mirror-17                               ONLINE       0     0     0
            c2640dc1-ecde-4638-8937-169b740b88aa  ONLINE       0     0     0
            379daf79-69ac-4968-9a27-5a6b503bbcc4  ONLINE       0     0     0
          mirror-18                               ONLINE       0     0     0
            822db76c-4def-4ead-9a8c-5b1175a49be8  ONLINE       0     0     0
            1dd58ac2-e8b8-4afd-a358-01d5e69bd07e  ONLINE       0     0     0
          mirror-19                               UNAVAIL      0     0     0  insufficient replicas
            10910638645217480881                  UNAVAIL      0     0     0  was /dev/disk/by-partuuid/9318f0f4-7                    2fd-4ad1-8292-21e8a8d8b82c
            17665183671171580697                  UNAVAIL      0     0     0  was /dev/disk/by-partuuid/e2f3d3c3-0                    033-4300-8c38-a7a56513f145
          spare-20                                DEGRADED     0     0     0
            e1da746c-2b0a-4297-bb4b-a30088cec248  DEGRADED     0     0     0  too many errors
            7793a8b5-da95-4d28-893e-fdf468afdc1c  ONLINE       0     0     0
        special
          mirror-11                               ONLINE       0     0     0
            63de864e-c4c4-41c0-b495-1bd1fd723c64  ONLINE       0     0     0
            be807c16-c6d9-417e-a5b9-19b6af5ec837  ONLINE       0     0     0
          mirror-12                               ONLINE       0     0     0
            cad9688f-85b8-41f2-8f89-5a66c67789a7  ONLINE       0     0     0
            323a1964-db36-40c2-be4f-93bc2cb24843  ONLINE       0     0     0
        cache
          81f7d0b3-f493-4674-8aeb-6e1da5059321    ONLINE       0     0     0
          d41f4ac8-4850-46a9-ae77-6fb0a4317b58    ONLINE       0     0     0
        spares
          7793a8b5-da95-4d28-893e-fdf468afdc1c    INUSE     currently in use

errors: 23950 data errors, use '-v' for a list
root@prod[/sadness/movies/need rencode]# zpool status


```'

It’s the same, just use /dev/disk/by-uuid/whatever

1 Like

For those following along at home,
You CANNOT remove a device - even a spare - from a read-only dataset. With some cajiggering, I was able to remove the UUID from the spare disks, and ignore the fact that the “spares” were “damaged” in the pool. I was then able to use them as “new” disks for a differant pool.

With the one VDEV damaged there was some data loss, but that particular VDEV was very new and so it was minimal. Most of the lost data was successfully backed up to my off-site backup at a friend’s house, and I can pull it down (while he has gigabit download, he has only 30 Mbps upload, which is why I tried so hard to not rely on it).

As for the rest of the data, I copied it to some individual disks (in other words, I have 6 individual “pools” of 1 disk). To move the data off of the “bad” read only pool, I am just using
cp -r

That process is completed. @wendell is right, I’ve validated that the data is in-fact good and the files are not corrupted. ZFS will TELL YOU when there’s something wrong. If this were ANY OTHER FILESYSTEM, I’d be receiving files with holes in them and I wouldn’t even know. THANK YOU ZFS. In other words, if you give it a file, it will in fact give you the same file back as it was when it first got it.

Now I have re-created my pool, again as mirrors, only 7 wide for now. I am copying the data off of those “pools” and onto my “new” pool using the same process. If all goes well and nothing dies over the next day or so we should be back to square one.

Hopefully this thread acts as both a cautionary tale, as well as one that celebrates why we are all here in the first place. TN really is great.

Also, I hate shell games. I feel like most of what I have done as a professional in IT is some form of shell game. lol.

1 Like

Depending on your snapshotting strategy you may be able to zfs send complete snapshots (that were created before you added the failed devices). That may have safed a lot of ISP bandwidth.

Also, use cp -a to keep file properties, in case you use SELinux cp -aZ.

Cheers!

1 Like