Storage server bugging out

Yeah, I’ve had same messages when my sata cable wasn’t up to snuff. It was working for a while and then I had link resets. I had to replace it and then all was good.

Kernel only knows it cant communicate with device, but if there’s no module for that device then you have to troubleshoot hardware.

So this can be backplane, port on backplane, cable, or even port on your controller.
That’s the main reason I switched to miniSAS, because troubleshooting more than few connections is PITA.

BTW, do you run smart monitoring? If not check smartctl on all drives, and configure smartd after you fix this issue.

Edit: Oh, and I would check fanout cable first, most likely and easiest to replace. Swap them even. But I would take out all drives first, and put in some spares.

Yeah I could swap out the sata hba’s for mini-sas hba’s, but that gets expensive pretty quick (like 600 euro), which is stupid because it’s the same tech with different connectors but what can we do.

I just moved around the cable connections, doesn’t change anything. If anything, the problem manifested earlier after boot. No smartmon, wanted to install but box already dead before I could.

Edit: also these cables were already pretty hard to come by. The “reverse” kind are not that widely available, it would suck if I would have to find a different source.

I’ve completely bypassed one of the controllers and the bottom tow rows on the backplane. So I am now running on motherboard sata and 1 of the controllers. It is holding out a lot longer than before.

I removed one of the spares to make this possible. But there’s now a lot of resilvering going on which I’d rather just stop and put the drives back as spares:

$ sudo zpool status -v
  pool: data
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Nov 19 13:37:26 2020
	455G scanned out of 15.6T at 150M/s, 29h20m to go
	83.7G resilvered, 2.85% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	data                                            ONLINE       0     0     0
	  raidz2-0                                      ONLINE       0     0     0
	    wwn-0x5000c500b2b39f7b                      ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_S300GPKC             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TQVA             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TMRG             ONLINE       0     0     0
	    spare-4                                     ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY31ED3           ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY7PJGC           ONLINE       0     0     0  (resilvering)
	      ata-ST4000VN008-2DR166_ZGY7PJKY           ONLINE       0     0     0  (resilvering)
	      ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786  ONLINE       0     0     0  (resilvering)
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K2JYANKV    ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZM403X3Z             ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZM40355X             ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZGY3CCMN             ONLINE       0     0     0
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5    ONLINE       0     0     0
	logs
	  wwn-0x5002538d702bc018-part3                  ONLINE       0     0     0
	cache
	  wwn-0x5002538d702bc018-part4                  ONLINE       0     0     0
	spares
	  ata-ST4000VN008-2DR166_ZGY7PJGC               INUSE     currently in use
	  ata-ST4000VN008-2DR166_ZGY7PJKY               INUSE     currently in use
	  ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786      INUSE     currently in use
	  ata-ST4000DM000-1F2168_S300J2JV               UNAVAIL 

errors: Permanent errors have been detected in the following files:

        /data/Archive/home-2014/home/xxxxx/backup.tar.gz
        /data/Archive/xxxxxx.rar

I notice there are now also some files that marked as permanently damaged. How permanent is that?

Edit: Yeah, never mind. It took a while longer but the exact same errors are back.

Few years ago i bought couple Supermicro controllers, new for 100-150eu, and I flashed them to IT mode.
https://www.supermicro.com/en/products/accessories/addon/AOC-USAS2-L8i.php?TYP=E
Its UIO not ATX standard, but its pretty easy to fasten them inside rack case.
Also now you can get various used ones, based on LSI2008 for like 30eu. It can be RAID, you can flash it to HBA/IT, just check if there’s IT bios available beforehand.

So no need to spend tons of money if you just have mechanical drives. Cheap previous gen 6Gb/s is fine.

Well, there’s your clue. If moving cables magnified problem its pretty obvious they’re probably culprit.

You can check smart on disks one by one connecting them to another box with single sata cable.
Or even just unplug backplane, take out controllers and use onboard sata if mobo has them. Boot from usb if your boot partition was going trough backplane too.

Yeah, I know. I’ve had the same problem like 10 yo. Even if I found proper fanout it was way too expensive, or store couldn’t even specify if its normal or reverse.

On the other hand miniSAS SFF-8087 is always the same, and now you can get new one for 10-20 eurorubles.
Lack of headache - priceless :wink:

Don’t use partitions for the logs and cache as it looks to be a hdd and you will quickly exhaust your iops.

While zfs won’t scream at you if you do this it works best with complete full access to a drive.

1 Like

Actually those are partitions of a SSD

Just occured to me, because its few disks at once I would also check backplane power plug. If its 4 pin molex, then they aren’t the best, and sometimes don’t connect properly, especially when old and oxidized. And result would be pretty much the same what you see in journal.

Also do you have redundant psu in your box? If yes then also checking one part of psu at a time would rule out psu dying. Just few months ago I’ve had this situation, where dying side of psu didn’t switch to good one and strange things were happening.

Yeah I was also considering that. All the connectors are brand new but perhaps the splitters I’m using aren’t the best. There’s no redundant PSU, it’s a 1000w corsair. Should be enough juice, even with a 1950x and up to 24 HDDs, right?

Sure 1k is plenty. Without GPU you probably using less than 300w

Oh, for sure, splitters are no1 problem for me when somethings wrong with power. In my desktop last month I went through 3 molex->sata splitters because one of my drives was constantly thrown out from RAID.

Alright, update on this:

  • I took out the 3 ssds from the backplane and connected them with plain sata to one of the controllers
  • Only the first 3 rows on the backplane are now populated (2 spares are not connected)
  • These 3 rows have 2 things common with each other:
    1. All three of them are connected using a a shorter blue cable (I also have 3 longer grey cables that I had to source from somewhere else)
    2. 4 out of 5 molex sockets (don’t ask me why there are 5 for 3x6 drives) are connected using daisy chained 1>2 molex splitters (the rest of the molex sockets use a 1>6 splitter).
  • One of the rows is connected to both controllers, so if this attempt succeeds, the problem is clearly with the longer, grey cables.
  • A slowly increasing number of files are reported by zfs status as having “permanent errors”. I hope this is not as bad as it sounds. It’s not constantly increasing, I mean it has been increasing from 1 to 3 as I’ve been trying to debug this.
  • Ran for dev in /dev/sd?; do echo $dev; sudo smartctl -a $dev; done which didn’t yield any interesting results, but one of the disks does have a “degraded” state with too many errors in the cksum column. But I imagine this is probably some controller issues hangover. dmesg has stayed clean for 20 minutes which is much longer than before.

I’ll leave it for while longer, try to make sure it isn’t the molexes, and if it isn’t I guess I have to either source 3 new longer reverse breakout cables, or buy 2 of these:

and 3 or 4 or the appropriate cables…

Edit: running for about an hour now without any problems. Resilver rate is also double what it was. Reasonably sure now the problem is the grey cables but just to be sure I’ll also switch the molex splitters.

Well, glad you came to grips with it.
Finish scrubbing/resilvering, it will hopefully be ok.
if that degraded drive is ok then you can clear errors later.

I usually run smartctl -A for less output, and i’m mostly interested in Reallocated Sectors/Pending Sectors.

Remember to configure smart monitoring after you finish fixing it, because it often warns you to replace disk way before it really goes dead.

Also configure automatic scrubbing periodically. On my prod servers I go usually every weekend at night, because nobody is using it anyway. But some recommend like once a month. So pick somewhere between those.

1 Like

So the resilver finished overnight. This is what I was looking at, afterwards:

  pool: data
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 4.45T in 10h45m with 10 errors on Sat Nov 21 04:07:19 2020
config:

	NAME                                            STATE     READ WRITE CKSUM
	data                                            DEGRADED     0     0    18
	  raidz2-0                                      DEGRADED     0     0   134
	    wwn-0x5000c500b2b39f7b                      ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_S300GPKC             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TQVA             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TMRG             REMOVED      0     0     0
	    spare-4                                     ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY31ED3           ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY7PJGC           ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY7PJKY           ONLINE       0     0     0
	      ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786  ONLINE       0     0     0
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K2JYANKV    ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZM403X3Z             ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZM40355X             ONLINE       0     0     0
	    15234179876330307149                        UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST4000VN008-2DR166_ZGY3CCMN-part1
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5    DEGRADED     0     0     0  too many errors
	logs
	  wwn-0x5002538d702bc018-part3                  ONLINE       0     0     0
	cache
	  wwn-0x5002538d702bc018-part4                  ONLINE       0     0     0
	spares
	  ata-ST4000VN008-2DR166_ZGY7PJGC               INUSE     currently in use
	  ata-ST4000VN008-2DR166_ZGY7PJKY               INUSE     currently in use
	  ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786      INUSE     currently in use
	  ata-ST4000DM000-1F2168_S300J2JV               UNAVAIL 

errors: Permanent errors have been detected in the following files:

        <metadata>:<0x7e>
        /data/Archive/root-2020-08/usr/lib/jvm/java-11-openjdk-amd64/jmods/java.base.jmod

Interestingly, the metadata error stayed, 2 other files were removed and a new one was added to the list or files with “permanent errors”.

During the resilver, one of the drives experienced some errors. It’s that one that has state “REMOVED” here, ata4 in dmesg and /dev/sdd. This is most likely an unrelated problem though: probably this desktop grade disk just didn’t like the resilver. Also it’s one of the oldest ones in there I think.

Anyway, this specific disk failing does not worry me, but since another one is absent, I currently don’t have any parity disks. I need to be able to actually use 2 of the 3 spares currently “in use”.

Unfortunately though, I seem to be unable to detach them from their current use:

sudo zpool detach data ata-ST4000VN008-2DR166_ZGY31ED3
cannot detach ata-ST4000VN008-2DR166_ZGY31ED3: no valid replicas

To make matter worse, after running sudo zpool clear data ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5 to get rid of the vdevs DEGRADED state, a big resilvering started again:

  pool: data
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sat Nov 21 08:32:58 2020
	187G scanned out of 15.6T at 304M/s, 14h44m to go
	53.3G resilvered, 1.17% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	data                                            DEGRADED     0     0    18
	  raidz2-0                                      DEGRADED     0     0   134
	    wwn-0x5000c500b2b39f7b                      ONLINE       0     0     0  (resilvering)
	    ata-ST4000DM000-1F2168_S300GPKC             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TQVA             ONLINE       0     0     0
	    ata-ST4000DM000-1F2168_Z300TMRG             REMOVED      0     0     0
	    spare-4                                     ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY31ED3           ONLINE       0     0     0  (resilvering)
	      ata-ST4000VN008-2DR166_ZGY7PJGC           ONLINE       0     0     0  (resilvering)
	      ata-ST4000VN008-2DR166_ZGY7PJKY           ONLINE       0     0     0  (resilvering)
	      ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786  ONLINE       0     0     0  (resilvering)
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K2JYANKV    ONLINE       0     0     0
	    ata-ST4000VN008-2DR166_ZM403X3Z             ONLINE       0     0     0  (resilvering)
	    ata-ST4000VN008-2DR166_ZM40355X             ONLINE       0     0     0  (resilvering)
	    15234179876330307149                        UNAVAIL      0     0     0  was /dev/disk/by-id/ata-ST4000VN008-2DR166_ZGY3CCMN-part1
	    ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K3CC3UZ5    ONLINE       0     0     0
	logs
	  wwn-0x5002538d702bc018-part3                  ONLINE       0     0     0
	cache
	  wwn-0x5002538d702bc018-part4                  ONLINE       0     0     0
	spares
	  ata-ST4000VN008-2DR166_ZGY7PJGC               INUSE     currently in use
	  ata-ST4000VN008-2DR166_ZGY7PJKY               INUSE     currently in use
	  ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786      INUSE     currently in use
	  ata-ST4000DM000-1F2168_S300J2JV               UNAVAIL 

What I am going to do now is shut her down, and see if I can mcgyver some way to get the disks all hooked up without using the backplane. I probably should have done that in the first place.

Afterwards I still have to get rid of all of those in-use-but-not-really spares.

So that worked for about 5 minutes and then the whole system because unresponsive again. I’m just gonna wait for the new parts to arrive because I don’t think I’m making things better right now.

Yeah, when your system is touch and go its fools errand to resilver.

And detaching isn’t necessary, just pull it and do “zpool replace” with some numerical ID that shows if drive is missing.

Like in your last listing you can add new disk and just do:

zpool data replace 15234179876330307149 ata-YourNewDriveID

BTW. your logs are on single drive??

	logs
	  wwn-0x5002538d702bc018-part3                  ONLINE       0     0   0

This is tragedy waiting to happen, if thats not mirror…

Yeah, when your system is touch and go its fools errand to resilver.

It took a while due to a shipping issue, but I have the 3 sas HBAs installed and it’s been stable with a nice, slowly increasing, resilver rate. For about an hour so far. We’ll see tomorrow, but I am hopeful.

And detaching isn’t necessary, just pull it and do “zpool replace” with some numerical ID that shows if drive is missing.

The thing is though, that currently almost all my spares are “in use”, while none of my drives are marked as failing. I’m talking about this bit:

	    spare-4                                     ONLINE       0     0     0
	      ata-ST4000VN008-2DR166_ZGY31ED3           ONLINE       0     0     0  (resilvering)
	      ata-ST4000VN008-2DR166_ZGY7PJGC           ONLINE       0     0     0  (resilvering)
	      ata-ST4000VN008-2DR166_ZGY7PJKY           ONLINE       0     0     0  (resilvering)
	      ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K6UNP786  ONLINE       0     0     0  (resilvering)

I only need one of those drives to resilver, and rest to go back to the pool of spares.

I was talking in terms of “future endeavors”, because sometimes people don’t know how to do it.

More than few times someone was aking me for help because they removed bad disk and then did “zpool add” :wink:

Hope everything will go well.

I’m no expert, but from what I’ve read the SLOG does not have to be redundant, as it’s mostly to improve performance and protect against corrupt data at eg power loss. It’s been a while though so maybe I’m remembering wrong.

AFAIK l2arc is something that you can put wherever, its just cache.
And “logs” in status indicates that you made “Separate intent log”, and so far everything I’ve read suggested it should be redundant.

But I didn’t test it “on my own skin”, because I usually mirror everything anyway, so I didn’t dig into this very deep.

I guess, I have to test this in practice when I have time, unless someone will set me straigth.

It looks like I remembered correctly:

Terminology

Before we can begin, we need to get a few terms out of the way that seem to be confusing people on forums, blog posts, mailing lists, and general discussion. It confused me, even though I understood the end goal, right up to the writing of this post. So, let’s get at it:

  • ZFS Intent Log, or ZIL- A logging mechanism where all of the data to be the written is stored, then later flushed as a transactional write. Similar in function to a journal for journaled filesystems, like ext3 or ext4. Typically stored on platter disk. Consists of a ZIL header, which points to a list of records, ZIL blocks and a ZIL trailer. The ZIL behaves differently for different writes. For writes smaller than 64KB (by default), the ZIL stores the write data. For writes larger, the write is not stored in the ZIL, and the ZIL maintains pointers to the synched data that is stored in the log record.

  • Separate Intent Log, or SLOG- A separate logging device that caches the synchronous parts of the ZIL before flushing them to slower disk. This would either be a battery-backed DRAM drive or a fast SSD. The SLOG only caches synchronous data, and does not cache asynchronous data. Asynchronous data will flush directly to spinning disk. Further, blocks are written a block-at-a-time, rather than as simultaneous transactions to the SLOG. If the SLOG exists, the ZIL will be moved to it rather than residing on platter disk. Everything in the SLOG will always be in system memory.

from: https://pthree.org/2012/12/06/zfs-administration-part-iii-the-zfs-intent-log/

Yes, I was wrong. Sometimes when you see some mantra many times, you just stop thinking about it. Mea culpa.

And I was about to post to make this right because I just checked it on my own, to be sure. I was able to recover zpool with logs vdev missing:

image

Existing data seems intact, but part of data in transit probably got lost. Doesn’t really matter, unless some very important database or something.