ZFS scrub stalls ----------?!

igoodman · July 21, 2020, 10:31pm

Dear forum,

Since recently each scrub on my RaidZ2 stalls a little bit below 85% completion.

The scan rate drops from about 150M/s to about 64K/s, the issuing rate goes to nil.

Dedup and compression are active, ashift is 12.

The RaidZ2 comprises 10 4TB WD RED (CMR) connected via a HPE 240 HBA and a 500GB WD Black SSD as L2ARC; the System has 64GB of RAM and runs Ubuntu 20.04.

I initialized the RaidZ2 some weeks ago and transfered a lot of very important research data for my PhD on it (this data does highly benefit from dedup). After this transfer a scrub did go swiftly and completed ok.

Next I transfered my personal photo library to the RaidZ2 - then the problems started - since then all scrubs stall at about 85%

I checked the SMART reports from all drives and found no issues. I even checked the HBA but found no problems either.

The storage pool is reported as online with no errors.

$zpool list
pool_10x4TB 36,4T 17,9T 18,4T - - 53% 49% 7.47x ONLINE -

Yet being not able to scrub I ask:

Is my data still intact??

What does it mean for a ZFS pool to be online when scrubbing does not succeed?

Could therebe a bug in the scrubbing routine?
Or is this a sign of bad things to happen to my Raidz2 soon?

What can I do? What shall I do?

Any suggestions?

Thank you very much!

Sincerely,

Ingomar

nx2l · July 21, 2020, 11:30pm

Maybe a drive is about to die

SgtAwesomesauce · July 22, 2020, 1:28am

For how long do you let it sit there?

Are you able to access the pool while it’s in that “stuck” state?

What HBA? Is this a raid controller or JBOD?

Did you run a long test?

Who knows.

Something not working correctly is a sign of bad things to come.

Are there any errors in dmesg?

oO.o · July 22, 2020, 1:35am

Is any I/O-intensive task running? I had an rsync task do this to me once.

Also, what record size are you using?

igoodman · July 29, 2020, 12:27pm

I did run the scrub several times and it always stalled at the same point.

The SMART tests did not reveal any errors - yet I recognized that there was a problem with the UID in ZFS – one of the drives hat an unusual one (differing from all the others in zpool status -v … only numbers not alphanumeric).
I did export the zpool and reimport it by /dev/by-uid – then all disks showed up with the same UID and zfs performance was trashed (though still online!)

I then imported it asl /dev/by-id and the whole situation resolved.
The scrubs are successfull now and without any problems.

Well the good thing is: the raidz2 is ssafe - but can anyone explain what happened?

Dynamic_Gravity · July 29, 2020, 3:41pm

You’re not supposed to setup pools via drive letter as this is volitile. Like plugging in a flash drive could mess it all up if it gets picked up first drive letter or somewhere not expecting.

igoodman · August 7, 2020, 11:25am

@Dynamic_Gravity

Well, I did set up the RaidZ2 initially via drive letters but than exportet it and reimported by UID, so I assumed to be on the safe side.
But curiously there seems to be an UID mess up with my HBA, the HPE 240 Itś in HBA mode, I’ve quadrouple checked )

The UID of my drives are NOT unique, but only the IDs are …it’s very strange

Trooper_ish · August 7, 2020, 3:17pm

The pool might have the same UID for every member of each vdev, or the pool as a whole, depending on which tool you use.

/dev/disk/by-id/ should* be unique.
[Edit, meant blkid] should show all members in an array as having the same ref

the by-id combines the reported model and serial number. Some drives report funny numbers as a serial number, so some whole models appear identical. In that case, one can use the wwn- number from within the /by-id/ folder, but it is much less intuitive.

The last digit of the wwn can Sometimes show different to the drives label. The correct wwn can be viewed with other tools like sgparm, iirc, will check on that.

igoodman · August 10, 2020, 1:19pm

@Trooper_ish this was very helpful. thank you!

Yes, ideed - as soon as I used the wwn numbers it worked flawlessly.

In my opinion it sould be advised not to use UID for ZFS, at least with the HPE 240 HBA.

Trooper_ish · August 10, 2020, 10:15pm

I still use the /by-id/ as the wwn does not always make sense / match the drive.
only use the wwn if the main serial number does not get listed.

But you do you boo, I don;t have to switch your drives out of the machine.

You can label your drives with gpart or gdisk or something, and then make the array from those, I did that recently with a batch which did not wanna play nice…

SgtAwesomesauce · August 10, 2020, 10:24pm

That’s the proper one, tbh.

Trooper_ish · August 10, 2020, 10:39pm

or do it the long way

#use gdisk to label drives
sudo gdisk /dev/sdX #where X, use drive path
    o	#format to gpt, if not formatted / is MBR
    d	#clear any existing partitions if not in schemes
    n	#create partition sectors 3907012607 starting at 2048 for 2T 
        #or 7814017024 starting at 2048 for 4T drives
     			  
    t	#change partition type ["bf01" "Solaris /usr & Mac ZFS"]
    c	#label partition for use with /dev/disk/by-partlabel/
    w	#write changes to disk or q to exit

because teh ZFS mastery series, and Lucas’s storage books talk about using Geom to label disks.
And I cry in Linux…

but just the serial number from /by-id/ does me well enough, and Zfs auto formats drives well enough for me to not really worry about them.