Storage server bugging out

I’m no expert, but from what I’ve read the SLOG does not have to be redundant, as it’s mostly to improve performance and protect against corrupt data at eg power loss. It’s been a while though so maybe I’m remembering wrong.

AFAIK l2arc is something that you can put wherever, its just cache.
And “logs” in status indicates that you made “Separate intent log”, and so far everything I’ve read suggested it should be redundant.

But I didn’t test it “on my own skin”, because I usually mirror everything anyway, so I didn’t dig into this very deep.

I guess, I have to test this in practice when I have time, unless someone will set me straigth.

It looks like I remembered correctly:

Terminology

Before we can begin, we need to get a few terms out of the way that seem to be confusing people on forums, blog posts, mailing lists, and general discussion. It confused me, even though I understood the end goal, right up to the writing of this post. So, let’s get at it:

  • ZFS Intent Log, or ZIL- A logging mechanism where all of the data to be the written is stored, then later flushed as a transactional write. Similar in function to a journal for journaled filesystems, like ext3 or ext4. Typically stored on platter disk. Consists of a ZIL header, which points to a list of records, ZIL blocks and a ZIL trailer. The ZIL behaves differently for different writes. For writes smaller than 64KB (by default), the ZIL stores the write data. For writes larger, the write is not stored in the ZIL, and the ZIL maintains pointers to the synched data that is stored in the log record.

  • Separate Intent Log, or SLOG- A separate logging device that caches the synchronous parts of the ZIL before flushing them to slower disk. This would either be a battery-backed DRAM drive or a fast SSD. The SLOG only caches synchronous data, and does not cache asynchronous data. Asynchronous data will flush directly to spinning disk. Further, blocks are written a block-at-a-time, rather than as simultaneous transactions to the SLOG. If the SLOG exists, the ZIL will be moved to it rather than residing on platter disk. Everything in the SLOG will always be in system memory.

from: https://pthree.org/2012/12/06/zfs-administration-part-iii-the-zfs-intent-log/

Yes, I was wrong. Sometimes when you see some mantra many times, you just stop thinking about it. Mea culpa.

And I was about to post to make this right because I just checked it on my own, to be sure. I was able to recover zpool with logs vdev missing:

image

Existing data seems intact, but part of data in transit probably got lost. Doesn’t really matter, unless some very important database or something.

Sigh… Still having trouble. So the resilver finished and the status was this:

  pool: data
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 4.45T in 278h1m with 2 errors on Wed Dec  2 22:34:40 2020
config:

	NAME                                            STATE     READ WRITE CKSUM
	data                                            ONLINE       0     0     2
	  raidz2-0                                      ONLINE       0     0     4
	    wwn-0x5000c500b2b39f7b                      ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_S300GPKC             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TQVA             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TMRG             ONLINE       0     0     0
	    spare-4                                     ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY31ED3           ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY7PJGC           ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY7PJKY           ONLINE       0     0     0
	      ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786  ONLINE       0     0     0
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K2JYANKV    ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZM403X3Z             ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZM40355X             ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZGY3CCMN             ONLINE       0     0     0
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5    ONLINE       0     0     0
	logs
	  wwn-0x5002538d702bc018-part3                  ONLINE       0     0     0
	cache
	  wwn-0x5002538d702bc018-part4                  ONLINE       0     0     0
	spares
	  ata-ST4000VN008-2DR166_ZGY7PJGC               INUSE     currently in use
	  ata-ST4000VN008-2DR166_ZGY7PJKY               INUSE     currently in use
	  ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786      INUSE     currently in use
	  ata-ST4000DM000-1F2168_S300J2JV               AVAIL   

errors: Permanent errors have been detected in the following files:

        data:<0xca8594>

I then rebooted the machine, and now I’m looking at this:

  pool: data
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Dec  2 22:41:36 2020
	509G scanned out of 15.6T at 398M/s, 11h1m to go
	147G resilvered, 3.19% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	data                                            ONLINE       0     0     0
	  raidz2-0                                      ONLINE       0     0     0
	    wwn-0x5000c500b2b39f7b                      ONLINE       0     0     0  (resilvering)
	    ata-ST4000DM000-1F2168_S300GPKC             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TQVA             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TMRG             ONLINE       0     0     0  (resilvering)
	    spare-4                                     ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY31ED3           ONLINE       0     0     0  (resilvering)
	      ata-ST4000VN008-2DR166_ZGY7PJGC           ONLINE       0     0     0  (resilvering)
	      ata-ST4000VN008-2DR166_ZGY7PJKY           ONLINE       0     0     0  (resilvering)
	      ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786  ONLINE       0     0     0  (resilvering)
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K2JYANKV    ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZM403X3Z             ONLINE       0     0     0  (resilvering)
	    ata-ST4000VN008-2DR166_ZM40355X             ONLINE       0     0     0  (resilvering)
	    ata-ST4000VN008-2DR166_ZGY3CCMN             ONLINE       0     0     0  (resilvering)
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5    ONLINE       0     0     0
	logs
	  wwn-0x5002538d702bc018-part3                  ONLINE       0     0     0
	cache
	  wwn-0x5002538d702bc018-part4                  ONLINE       0     0     0
	spares
	  ata-ST4000VN008-2DR166_ZGY7PJGC               INUSE     currently in use
	  ata-ST4000VN008-2DR166_ZGY7PJKY               INUSE     currently in use
	  ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786      INUSE     currently in use
	  ata-ST4000DM000-1F2168_S300J2JV               AVAIL   

errors: Permanent errors have been detected in the following files:

        data:<0xca8594>

I don’t understand why this is resilvering. I also have no idea how to get rid of the data:<xxx> file, although I suppose these things are related.

Well, after your few mishaps with connections its expected that some blocks might be corrupted. But I usually seen also some read/write/cksum errors.

But you may have newer version of zfs than I do so, maybe someone else will explain this. Because I haven’t seen anything like this. Also I don’t use spares.

Also few times one of my arrays was doing weird stuff after similar cable failure. So after resilver finished I exported it and imported back and all was fine after that.

BTW. Keep eye on dmesg/journalctl/smart.

I woke up to this:

config:

	NAME                                            STATE     READ WRITE CKSUM
	data                                            ONLINE       0     0     0
	  raidz2-0                                      ONLINE       0     0     0
	    wwn-0x5000c500b2b39f7b                      ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_S300GPKC             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TQVA             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TMRG             ONLINE       0     0     0
	    spare-4                                     ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY31ED3           ONLINE       0     0     0
	      ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786  ONLINE       0     0     0
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K2JYANKV    ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZM403X3Z             ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZM40355X             ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZGY3CCMN             ONLINE       0     0     0
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5    ONLINE       0     0     0
	logs
	  wwn-0x5002538d702bc018-part3                  ONLINE       0     0     0
	cache
	  wwn-0x5002538d702bc018-part4                  ONLINE       0     0     0
	spares
	  ata-ST4000VN008-2DR166_ZGY7PJGC               AVAIL   
	  ata-ST4000VN008-2DR166_ZGY7PJKY               AVAIL   
	  ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786      INUSE     currently in use
	  ata-ST4000DM000-1F2168_S300J2JV               AVAIL   

errors: No known data errors

After a reboot, this stayed. I was then able to zpool detach data ata-ST4000VN008-2DR166_ZGY31ED3 and zpool add data spare ata-ST4000VN008-2DR166_ZGY31ED3.

The second resilver seems to have resolved everything.

Finally.

2 Likes

Glad to hear it :slight_smile: Hope that with proper HBA’s your headaches are over.

I guess not.

  pool: data
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Jan 10 07:54:25 2021
	2.22T scanned out of 20.3T at 364M/s, 14h30m to go
	649G resilvered, 10.92% done
config:

	NAME                                          STATE     READ WRITE CKSUM
	data                                          ONLINE       0     0     0
	  raidz2-0                                    ONLINE       0     0     0
	    wwn-0x5000c500b2b39f7b                    ONLINE       0     0     0
	    spare-1                                   ONLINE       0     0     0
	      ata-ST4000DM000-1F2168_S300GPKC         ONLINE       0     0     0
	      ata-ST4000DM000-1F2168_S300J2JV         ONLINE       0     0     0  (resilvering)
	    spare-2                                   ONLINE       0     0     3
	      ata-ST4000DM000-1F2168_Z300TQVA         ONLINE       0     0     0  (resilvering)
	      ata-ST4000VN008-2DR166_ZGY7PJGC         ONLINE       0     0     0  (resilvering)
	    spare-3                                   ONLINE       0     0     3
	      ata-ST4000DM000-1F2168_Z300TMRG         ONLINE       0     0     0  (resilvering)
	      ata-ST4000VN008-2DR166_ZGY7PJKY         ONLINE       0     0     0  (resilvering)
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786  ONLINE       0     0     0
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K2JYANKV  ONLINE       0     0     2  (resilvering)
	    ata-ST4000VN008-2DR166_ZM403X3Z           ONLINE       0     0     2  (resilvering)
	    ata-ST4000VN008-2DR166_ZM40355X           ONLINE       0     0     1  (resilvering)
	    ata-ST4000VN008-2DR166_ZGY3CCMN           ONLINE       0     0     0
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5  ONLINE       0     0     2  (resilvering)
	logs
	  wwn-0x5002538d702bc018-part3                ONLINE       0     0     0
	cache
	  wwn-0x5002538d702bc018-part4                ONLINE       0     0     0
	spares
	  ata-ST4000VN008-2DR166_ZGY7PJGC             INUSE     currently in use
	  ata-ST4000VN008-2DR166_ZGY7PJKY             INUSE     currently in use
	  ata-ST4000DM000-1F2168_S300J2JV             INUSE     currently in use
	  ata-ST4000VN008-2DR166_ZGY31ED3             AVAIL   

errors: No known data errors

This is after had to pull the plug because I was moving the server and it wouldn’t shut down. Obviously today is 17th, not the 10th, which is when zfs claims the resilver started. Likely some drives started reporting errors and nobody noticed the pool was unresponsive, and it only started actually resilvering after I pulled the plug and booted her up again.

Only 2 the disks have errors in the smart history, but I think those are from before, not new ones. In any case, 9 disks are now resilvering, 3 spares are in use, and 6 show checksum errors.

I’m at a loss.

Edit: so I don’t think the hard drives are having read errors like originally, this time it seems like they are randomly resetting.

$ dmesg | grep ata
[    0.000000] Memory: 98809088K/100557944K available (12300K kernel code, 2482K rwdata, 4276K rodata, 2440K init, 2724K bss, 1748856K reserved, 0K cma-reserved)
[    0.372024] libata version 3.00 loaded.
[    1.126679] acpi_cpufreq: overriding BIOS provided _PSD data
[    1.620262] Write protecting the kernel read-only data: 20480k
[    1.724892] ata1: SATA max UDMA/133 abar m131072@0xba780000 port 0xba780100 irq 56
[    1.724982] ata2: SATA max UDMA/133 abar m131072@0xba780000 port 0xba780180 irq 56
[    1.725070] ata3: SATA max UDMA/133 abar m131072@0xba780000 port 0xba780200 irq 56
[    1.725159] ata4: SATA max UDMA/133 abar m131072@0xba780000 port 0xba780280 irq 56
[    1.725248] ata5: SATA max UDMA/133 abar m131072@0xba780000 port 0xba780300 irq 56
[    1.725335] ata6: SATA max UDMA/133 abar m131072@0xba780000 port 0xba780380 irq 56
[    1.725423] ata7: SATA max UDMA/133 abar m131072@0xba780000 port 0xba780400 irq 56
[    1.725510] ata8: SATA max UDMA/133 abar m131072@0xba780000 port 0xba780480 irq 56
[    1.726558] ata9: SATA max UDMA/133 abar m4096@0xbaf08000 port 0xbaf08100 irq 61
[    1.727593] ata10: SATA max UDMA/133 abar m4096@0x8e700000 port 0x8e700100 irq 63
[    2.039025] ata9: SATA link down (SStatus 0 SControl 300)
[    2.039799] ata1: SATA link down (SStatus 0 SControl 300)
[    2.043141] ata10: SATA link down (SStatus 0 SControl 300)
[    2.090117] scsi 6:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    2.150166] scsi 11:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    2.311614] scsi 11:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    2.324029] scsi 6:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    2.354469] ata2: SATA link down (SStatus 0 SControl 300)
[    2.561014] scsi 11:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    2.666933] ata3: SATA link down (SStatus 0 SControl 300)
[    2.810756] scsi 11:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    2.820334] scsi 11:0:4:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    2.824414] scsi 11:0:5:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    2.827350] scsi 11:0:6:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    2.982768] ata4: SATA link down (SStatus 0 SControl 300)
[    3.294881] ata5: SATA link down (SStatus 0 SControl 300)
[    3.434840] scsi 12:0:0:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    3.437699] scsi 12:0:1:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    3.440771] scsi 12:0:2:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    3.443216] scsi 12:0:3:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    3.455140] scsi 12:0:4:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    3.611932] ata6: SATA link down (SStatus 0 SControl 300)
[    3.796314] scsi 12:0:5:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    3.926540] ata7: SATA link down (SStatus 0 SControl 300)
[    4.038157] scsi 12:0:6:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    4.067390] scsi 12:0:7:0: atapi(n), ncq(y), asyn_notify(n), smart(y), fua(y), sw_preserve(y)
[    4.238984] ata8: SATA link down (SStatus 0 SControl 300)
[    5.461374] EXT4-fs (sdj2): mounted filesystem with ordered data mode. Opts: (null)
[    6.210200] systemd[1]: Listening on LVM2 metadata daemon socket.
[    7.208700] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[    7.852306] EXT4-fs (nvme0n1p1): mounted filesystem with ordered data mode. Opts: (null)
[  176.461561] nouveau 0000:42:00.0: msvld: unable to load firmware data
$ dmesg | grep sd
[    1.715735] ahci 0000:01:00.1: flags: 64bit ncq sntf stag pm led clo only pmp pio slum part sxs deso sadm sds apst 
[    2.325130] sd 6:0:0:0: Power-on or device reset occurred
[    2.325219] sd 6:0:0:0: Attached scsi generic sg0 type 0
[    2.325391] sd 6:0:1:0: Power-on or device reset occurred
[    2.325478] sd 6:0:1:0: Attached scsi generic sg1 type 0
[    2.326899] sd 6:0:1:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    2.326993] sd 6:0:1:0: [sdb] 4096-byte physical blocks
[    2.343427] sd 6:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    2.343521] sd 6:0:0:0: [sda] 4096-byte physical blocks
[    2.383231] sd 6:0:0:0: [sda] Write Protect is off
[    2.383305] sd 6:0:0:0: [sda] Mode Sense: 7f 00 10 08
[    2.385124] sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.481855]  sda: sda1 sda9
[    2.527518] sd 6:0:0:0: [sda] Attached SCSI disk
[    2.619188] sd 6:0:1:0: [sdb] Write Protect is off
[    2.619261] sd 6:0:1:0: [sdb] Mode Sense: 7f 00 10 08
[    2.622972] sd 6:0:1:0: [sdb] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.756641]  sdb: sdb1 sdb9
[    2.828727] sd 11:0:0:0: Attached scsi generic sg2 type 0
[    2.828910] sd 11:0:1:0: Power-on or device reset occurred
[    2.828995] sd 11:0:1:0: Attached scsi generic sg3 type 0
[    2.829164] sd 11:0:2:0: Power-on or device reset occurred
[    2.829249] sd 11:0:2:0: Attached scsi generic sg4 type 0
[    2.829414] sd 11:0:3:0: Power-on or device reset occurred
[    2.829505] sd 11:0:3:0: Attached scsi generic sg5 type 0
[    2.829715] sd 11:0:4:0: Attached scsi generic sg6 type 0
[    2.829909] sd 11:0:5:0: Attached scsi generic sg7 type 0
[    2.830096] sd 11:0:6:0: Attached scsi generic sg8 type 0
[    2.830440] sd 11:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    2.830538] sd 11:0:0:0: [sdc] 4096-byte physical blocks
[    2.831313] sd 11:0:2:0: [sde] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    2.831419] sd 11:0:2:0: [sde] 4096-byte physical blocks
[    2.831506] sd 11:0:3:0: [sdf] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    2.831600] sd 11:0:3:0: [sdf] 4096-byte physical blocks
[    2.832140] sd 11:0:6:0: [sdi] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    2.832245] sd 11:0:6:0: [sdi] 4096-byte physical blocks
[    2.832254] sd 11:0:5:0: [sdh] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    2.832414] sd 11:0:5:0: [sdh] 4096-byte physical blocks
[    2.833715] sd 11:0:4:0: [sdg] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    2.833810] sd 11:0:4:0: [sdg] 4096-byte physical blocks
[    2.834649] sd 11:0:3:0: [sdf] Write Protect is off
[    2.834723] sd 11:0:3:0: [sdf] Mode Sense: 7f 00 10 08
[    2.834943] sd 11:0:2:0: [sde] Write Protect is off
[    2.835015] sd 11:0:2:0: [sde] Mode Sense: 7f 00 10 08
[    2.835493] sd 11:0:3:0: [sdf] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.835758] sd 11:0:2:0: [sde] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.861286] sd 6:0:1:0: [sdb] Attached SCSI disk
[    2.865663] sd 11:0:4:0: [sdg] Write Protect is off
[    2.865736] sd 11:0:4:0: [sdg] Mode Sense: 7f 00 10 08
[    2.867379] sd 11:0:4:0: [sdg] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.898335] sd 11:0:1:0: [sdd] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    2.898429] sd 11:0:1:0: [sdd] 4096-byte physical blocks
[    2.922889] sd 11:0:1:0: [sdd] Write Protect is off
[    2.922961] sd 11:0:1:0: [sdd] Mode Sense: 7f 00 10 08
[    2.924569] sd 11:0:1:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.930249]  sdg: sdg1 sdg9
[    2.932273] sd 11:0:5:0: [sdh] Write Protect is off
[    2.932345] sd 11:0:5:0: [sdh] Mode Sense: 7f 00 10 08
[    2.935586] sd 11:0:6:0: [sdi] Write Protect is off
[    2.935659] sd 11:0:6:0: [sdi] Mode Sense: 7f 00 10 08
[    2.936271] sd 11:0:5:0: [sdh] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.939571] sd 11:0:6:0: [sdi] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.955484] sd 11:0:0:0: [sdc] Write Protect is off
[    2.955557] sd 11:0:0:0: [sdc] Mode Sense: 7f 00 10 08
[    2.959253] sd 11:0:0:0: [sdc] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.979196] sd 11:0:4:0: [sdg] Attached SCSI disk
[    3.012897]  sdd: sdd1 sdd9
[    3.035324]  sdi: sdi1 sdi9
[    3.035818]  sdh: sdh1 sdh9
[    3.056409] sd 11:0:1:0: [sdd] Attached SCSI disk
[    3.074504]  sdc: sdc1 sdc9
[    3.127638] sd 11:0:6:0: [sdi] Attached SCSI disk
[    3.134332] sd 11:0:5:0: [sdh] Attached SCSI disk
[    3.177556] sd 11:0:0:0: [sdc] Attached SCSI disk
[    3.273227]  sde: sde1 sde9
[    3.280788] sd 11:0:2:0: [sde] Attached SCSI disk
[    3.283839]  sdf: sdf1 sdf9
[    3.291335] sd 11:0:3:0: [sdf] Attached SCSI disk
[    4.070586] sd 12:0:0:0: Attached scsi generic sg9 type 0
[    4.070918] sd 12:0:1:0: Attached scsi generic sg10 type 0
[    4.071514] sd 12:0:2:0: Attached scsi generic sg11 type 0
[    4.072203] sd 12:0:3:0: Attached scsi generic sg12 type 0
[    4.072828] sd 12:0:4:0: Attached scsi generic sg13 type 0
[    4.073475] sd 12:0:2:0: [sdl] 488397168 512-byte logical blocks: (250 GB/233 GiB)
[    4.073558] sd 12:0:5:0: Power-on or device reset occurred
[    4.073991] sd 12:0:2:0: [sdl] 4096-byte physical blocks
[    4.075870] sd 12:0:5:0: Attached scsi generic sg14 type 0
[    4.075997] sd 12:0:0:0: [sdj] 976773168 512-byte logical blocks: (500 GB/466 GiB)
[    4.076170] sd 12:0:2:0: [sdl] Write Protect is off
[    4.076657] sd 12:0:6:0: Power-on or device reset occurred
[    4.076659] sd 12:0:1:0: [sdk] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    4.076660] sd 12:0:1:0: [sdk] 4096-byte physical blocks
[    4.076764] sd 12:0:6:0: Attached scsi generic sg15 type 0
[    4.077263] sd 12:0:7:0: Attached scsi generic sg16 type 0
[    4.077300] sd 12:0:2:0: [sdl] Mode Sense: 7f 00 10 08
[    4.082159] sd 12:0:1:0: [sdk] Write Protect is off
[    4.082322] sd 12:0:3:0: [sdm] 488397168 512-byte logical blocks: (250 GB/233 GiB)
[    4.082399] sd 12:0:1:0: [sdk] Mode Sense: 7f 00 10 08
[    4.082411] sd 12:0:2:0: [sdl] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    4.083015] sd 12:0:3:0: [sdm] 4096-byte physical blocks
[    4.083270] sd 12:0:0:0: [sdj] Write Protect is off
[    4.083271] sd 12:0:0:0: [sdj] Mode Sense: 7f 00 10 08
[    4.083578] sd 12:0:0:0: [sdj] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    4.084506] sd 12:0:1:0: [sdk] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    4.086496] sd 12:0:3:0: [sdm] Write Protect is off
[    4.087155] sd 12:0:3:0: [sdm] Mode Sense: 7f 00 10 08
[    4.087233] sd 12:0:7:0: [sdq] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    4.087836] sd 12:0:7:0: [sdq] 4096-byte physical blocks
[    4.087851] sd 12:0:4:0: [sdn] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    4.088511] sd 12:0:3:0: [sdm] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    4.088913]  sdl: sdl1 sdl9
[    4.089164] sd 12:0:4:0: [sdn] 4096-byte physical blocks
[    4.089677]  sdj: sdj1 sdj2 sdj3 sdj4
[    4.091381] sd 12:0:2:0: [sdl] Attached SCSI disk
[    4.093402] sd 12:0:0:0: [sdj] Attached SCSI disk
[    4.094579]  sdm: sdm1 sdm9
[    4.097261] sd 12:0:3:0: [sdm] Attached SCSI disk
[    4.107014] sd 12:0:6:0: [sdp] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    4.107306] sd 12:0:6:0: [sdp] 4096-byte physical blocks
[    4.118033] sd 12:0:5:0: [sdo] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB)
[    4.118284] sd 12:0:5:0: [sdo] 4096-byte physical blocks
[    4.126414] sd 12:0:7:0: [sdq] Write Protect is off
[    4.126644] sd 12:0:7:0: [sdq] Mode Sense: 7f 00 10 08
[    4.128282] sd 12:0:7:0: [sdq] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    4.134995] sd 12:0:4:0: [sdn] Write Protect is off
[    4.135230] sd 12:0:4:0: [sdn] Mode Sense: 7f 00 10 08
[    4.136865] sd 12:0:4:0: [sdn] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    4.142808] sd 12:0:5:0: [sdo] Write Protect is off
[    4.143045] sd 12:0:5:0: [sdo] Mode Sense: 7f 00 10 08
[    4.144656] sd 12:0:5:0: [sdo] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    4.185394]  sdq: sdq1 sdq9
[    4.197848]  sdn: sdn1 sdn9
[    4.230611] sd 12:0:7:0: [sdq] Attached SCSI disk
[    4.231826]  sdo: sdo1 sdo9
[    4.249221] sd 12:0:4:0: [sdn] Attached SCSI disk
[    4.277054] sd 12:0:5:0: [sdo] Attached SCSI disk
[    4.372418] sd 12:0:6:0: [sdp] Write Protect is off
[    4.372695] sd 12:0:6:0: [sdp] Mode Sense: 7f 00 10 08
[    4.374341] sd 12:0:6:0: [sdp] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    4.472296]  sdp: sdp1 sdp9
[    4.484647]  sdk: sdk1 sdk9
[    4.493556] sd 12:0:1:0: [sdk] Attached SCSI disk
[    4.516732] sd 12:0:6:0: [sdp] Attached SCSI disk
[    5.004538] EXT4-fs (sdj2): INFO: recovery required on readonly filesystem
[    5.004828] EXT4-fs (sdj2): write access will be enabled during recovery
[    5.457686] EXT4-fs (sdj2): recovery complete
[    5.461374] EXT4-fs (sdj2): mounted filesystem with ordered data mode. Opts: (null)
[    6.246569] EXT4-fs (sdj2): re-mounted. Opts: (null)
[    6.339548] Installing knfsd (copyright (C) 1996 [email protected]).

Yes on further investigation, on the 10th zfs had an oopsie and no-one noticed:

$ journalctl -k --boot -1 --since '2021-01-10'
-- Logs begin at Fri 2019-02-01 05:05:42 UTC, end at Sun 2021-01-17 16:45:01 UTC. --
Jan 10 04:43:17 goliath kernel: perf: interrupt took too long (2501 > 2500), lowering kernel.perf_event_max_sample_rate to 79750
Jan 10 09:19:12 goliath kernel: BUG: unable to handle kernel paging request at 0000004000000038
Jan 10 09:19:12 goliath kernel: IP: __mutex_lock.isra.5+0x176/0x4e0
Jan 10 09:19:12 goliath kernel: PGD 1822b5d067 P4D 1822b5d067 PUD 0 
Jan 10 09:19:12 goliath kernel: Oops: 0000 [#1] SMP NOPTI
Jan 10 09:19:12 goliath kernel: Modules linked in: cpuid ip_vs_rr xt_ipvs ip_vs nf_defrag_ipv6 xt_REDIRECT nf_nat_redirect xt_nat xt_tcpudp veth vxlan ip6_udp_tunnel udp_tunnel iptable_mangle xt_mark xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xf
Jan 10 09:19:12 goliath kernel:  sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor asy
Jan 10 09:19:12 goliath kernel: CPU: 31 PID: 4952 Comm: z_rd_int_1 Tainted: P           O     4.15.0-129-generic #132-Ubuntu
Jan 10 09:19:12 goliath kernel: Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F11e 09/17/2018
Jan 10 09:19:12 goliath kernel: RIP: 0010:__mutex_lock.isra.5+0x176/0x4e0
Jan 10 09:19:12 goliath kernel: RSP: 0018:ffffa9cc882a3c50 EFLAGS: 00010206
Jan 10 09:19:12 goliath kernel: RAX: 0000004000000000 RBX: ffff90debc2815c0 RCX: 0000000000000000
Jan 10 09:19:12 goliath kernel: RDX: 0000004000000000 RSI: ffff90d67f9e8000 RDI: ffff90debc2815c0
Jan 10 09:19:12 goliath kernel: RBP: ffffa9cc882a3cd0 R08: 0000000000200000 R09: 0000000000000001
Jan 10 09:19:12 goliath kernel: R10: ffffa9cc882a3cf0 R11: 000182c8584ae580 R12: ffff90debc281240
Jan 10 09:19:12 goliath kernel: R13: 0000000000000002 R14: ffff90debc287eb0 R15: ffff90debc2815c0
Jan 10 09:19:12 goliath kernel: FS:  0000000000000000(0000) GS:ffff90e28e7c0000(0000) knlGS:0000000000000000
Jan 10 09:19:12 goliath kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 10 09:19:12 goliath kernel: CR2: 0000004000000038 CR3: 0000001822ef0000 CR4: 00000000003406e0
Jan 10 09:19:12 goliath kernel: Call Trace:
Jan 10 09:19:12 goliath kernel:  ? __switch_to_asm+0x41/0x70
Jan 10 09:19:12 goliath kernel:  ? __switch_to_asm+0x35/0x70
Jan 10 09:19:12 goliath kernel:  ? __switch_to_asm+0x41/0x70
Jan 10 09:19:12 goliath kernel:  ? __switch_to_asm+0x35/0x70
Jan 10 09:19:12 goliath kernel:  ? __switch_to_asm+0x41/0x70
Jan 10 09:19:12 goliath kernel:  ? __switch_to_asm+0x35/0x70
Jan 10 09:19:12 goliath kernel:  ? __switch_to_asm+0x41/0x70
Jan 10 09:19:12 goliath kernel:  ? __switch_to_asm+0x35/0x70
Jan 10 09:19:12 goliath kernel:  ? __switch_to_asm+0x41/0x70
Jan 10 09:19:12 goliath kernel:  __mutex_lock_slowpath+0x13/0x20
Jan 10 09:19:12 goliath kernel:  ? __mutex_lock_slowpath+0x13/0x20
Jan 10 09:19:12 goliath kernel:  mutex_lock+0x2f/0x40
Jan 10 09:19:12 goliath kernel:  zio_remove_child+0x52/0x140 [zfs]
Jan 10 09:19:12 goliath kernel:  zio_done+0x4f8/0xdf0 [zfs]
Jan 10 09:19:12 goliath kernel:  zio_execute+0x90/0xf0 [zfs]
Jan 10 09:19:12 goliath kernel:  taskq_thread+0x2ab/0x4e0 [spl]
Jan 10 09:19:12 goliath kernel:  ? wake_up_q+0x80/0x80
Jan 10 09:19:12 goliath kernel:  ? zio_reexecute+0x390/0x390 [zfs]
Jan 10 09:19:12 goliath kernel:  kthread+0x121/0x140
Jan 10 09:19:12 goliath kernel:  ? task_done+0xb0/0xb0 [spl]
Jan 10 09:19:12 goliath kernel:  ? kthread_create_worker_on_cpu+0x70/0x70
Jan 10 09:19:12 goliath kernel:  ret_from_fork+0x22/0x40
Jan 10 09:19:12 goliath kernel: Code: 00 31 c0 48 8b 5d c8 65 48 33 1c 25 28 00 00 00 0f 85 77 03 00 00 48 83 c4 50 5b 41 5a 41 5c 41 5d 41 5e 41 5f 5d 49 8d 62 f8 c3 <8b> 50 38 85 d2 0f 85 01 02 00 00 65 48 8b 04 25 00 5c 01 00 48 
Jan 10 09:19:12 goliath kernel: RIP: __mutex_lock.isra.5+0x176/0x4e0 RSP: ffffa9cc882a3c50
Jan 10 09:19:12 goliath kernel: CR2: 0000004000000038
Jan 10 09:19:12 goliath kernel: ---[ end trace 39f8b9bd5a158f93 ]---
Jan 17 14:55:04 goliath kernel: nfsd: last server has exited, flushing export cache
Jan 17 14:55:04 goliath kernel: br0: port 4(veth2) entered disabled state
Jan 17 14:55:04 goliath kernel: veth0fb35e8: renamed from eth0
Jan 17 14:55:04 goliath kernel: br0: port 4(veth2) entered disabled state
Jan 17 14:55:04 goliath kernel: device veth2 left promiscuous mode
Jan 17 14:55:04 goliath kernel: br0: port 4(veth2) entered disabled state
Jan 17 14:55:04 goliath kernel: br0: port 3(veth1) entered disabled state
Jan 17 14:55:04 goliath kernel: veth4738d80: renamed from eth1
Jan 17 14:55:04 goliath kernel: br0: port 3(veth1) entered disabled state
Jan 17 14:55:04 goliath kernel: device veth1 left promiscuous mode
Jan 17 14:55:04 goliath kernel: br0: port 3(veth1) entered disabled state

The odd thing is that used the pool earlier today, and didn’t notice any problems (apparently I’m also having some network issues but that’s probably unrelated). I’ve been planning to do a fresh install on this server although I doubt that will magically solve all my problems.

Once it’s all resilvered I’ll see if I can replace the power connectors and do a fresh install.

I’m not so sure there’s a direct correlation between the ‘link down’ messages and the issues. It hasn’t logged ‘SATA link up’ since I put in the HBAs (not after 21 Nov). I guess the kernel just logs SATA devices differently when using these type of HBAs. /dev/disk/by-path is now also talking about SAS and LUNs, which is not relevant but just more evidence that these are handled differently from plain SATA disks and perhaps the ‘link being down’ is natural consequence of that.

I’m just guessing here, I could really use some help.

Edit: so now one of the disks is producing errors. I would have liked to say this was the cause but it seems more likely that this disk (it is one of the Desktop model disks that I have in there) has just failed due to the resilvering. I’ll take it out anyway ofc, but I don’t expect that to be the end of it.

So resilvering finished, I marked the disk as offline, and now 2 disks are fucking resilvering AGAIN.

I am so losing my patience with ZFS.

So just now the resilvering seemed to be ongoing, judging by the trays’ hdd indication lights. I decided to SSH into the machine. At the prompt I press the up arrow, expecting sudo zpool status -v to appear. Nothing happens. I look over, all the lights went out. Keyboard unresponsive. I plug in VGA and the tail end of a kernel panic talking about irqs is showing.

FML. Why can’t this damn box just work.

Edit: alright, new theory. Perhaps the controllers are overheating. I replaced the standard massive fans that came with the case with smaller less noisy ones. Those will do just fine for the HDDs (no airflow makes them unhappy but they are quickly satisfied by a small breeze). But the controllers also require active cooling I’ve read and they are quite hot to the touch. This doesn’t explain all the issues but could be a factor. So I’ll put the big fans back and try again.

So what I’ll do:

  1. Run memtest
  2. Put back the big fans
  3. Replace the power molex cables
  4. Fresh OS install

If needed I can put the old Xeon workstation board back in.