ZFS - replacing a single unhealthy but still operational disk in an offline RAIDz2 vdev. Resilver or dd?

HanSolo · May 6, 2021, 5:54pm

I’ve got a disk in my pool (consisting of a single 6x 10TB RAIDz2 vdev) that spat out 8 Unrecoverable Secotors the other night. I shut the server down whilst I was sorting out my backups, and sourcing and testing a replacement disk, and now it’s almost time to begin the replacment procedure.

The replacement disk is almost done with my burn-in / testing process so I’m just starting to think how I’ll go about replacing the faulty disk. I’m just wondering… should I (or can I) just dd a copy of the unnhealthy disk to the replacement disk or should I just resilver it?

[EDIT] Just a note to say that I updated the title from ZFS - replacing an unhealthy but still operation disk. Resilver or dd? to
ZFS - replacing a single unhealthy but still operational disk in an offline RAIDz2 vdev. Resilver or dd ?, for the sake of clarity and so that any further replies are more on topic.

Trooper_ish · May 6, 2021, 6:14pm

You can DD, but I would suggest the live replace.
DD would faithfully recreate all errors while copying, so after the copy, would have unknown amount of bad data which might not be called for a while.
ZFS checks for good data on a read or write or scrub. It just checks the first copy it finds and uses that.
If some of the data is bad on the bad disc, ZFS might not use that bad copy for a while. If it only uses it when another drive has an error, it will only then realise it has 2 bad copies and then will loose the data.

If you live replace, all data written to new disc will have been checked at least once, so starts off good

oO.o · May 6, 2021, 6:51pm

Don’t need to panic about that. I might die soon but could also go on living for years.

+1, I’d say dd is an objectively worse alternative here.

Trooper_ish · May 6, 2021, 7:02pm

Also, please consider using the /dev/disk/by-id/ identifier like ata-wdc-20Dearx-djsgagah (you can leave the “/dev/disk/by-id/“ bit out, and just use the end bit) because it will list the serial or identifier for easier change.

If you DD the drive (the data partition) the serial will no longer match.
If you use the /dev/sda or just sda as the identifier, it might be harder to identify a failed drive if one actually dies. Also, the letters will change around on you.

The system will get active drives on boot, even if the letters have re-arranged (it uses the pool uuid on the data partition) but that might not help you too much

oO.o · May 6, 2021, 7:06pm

Is this in Free/TrueNAS or something you’re maintaining directly?

HanSolo · May 6, 2021, 7:20pm

The pool did have regularly scheduled scrubs before the older disk spat out the SMART errors. So surely if I replaced it with the dd’d disk and scrubbed the pool, it’d detect an errors that the SMART error may have caused to the single bad disk?

I’m not quite sure I understand you… but then again, my knowledge of ZFS is quite limited.

Surely if the data has been corrupted by the Uncorrectable Sectors, and that corrupted data is then written to the new disk by the dd command, ZFS will detect it because the data in question won’t match it’s checksum. Therefore, as long as I scrub the pool after installing the dd’d disk, any corrupted data would be detected, the data would then be resilvered from the rest of the pool / parity would it not?

I’d be grateful if you could go into more detail please.

I could understand how that scenario would happen on a mirrored vdev but wouldn’t it be incredibly unlikely on a RAIDz2 vdev that errors would develop on the other diskls that would only result in the loss of the corrupted data that may have been dd’d to the replacement disk?

I’ll use this opportunity to explain my reasoning behind the question.

If I was to do a live replacement, it’d mean all the disks in the pool would be stressed as the data for the new disk would have to be calculated from parity, where as a dd would just be a read write operation from one disk to another. I’d do the dd with just the two disks connected to either a desktop system / or my backup server. I’d then install the newly dd’d disk into my primary server, boot it back up, and perform a scrub

I don’t get what you mean by “maintaining directly” the 6x 10TB RAIDz2 pool is from a Proxmox system. It started life in a FreeNAS 11.1 system, which is where I performed a local replication of my old pool, to a new set of disks in preparation for the build of my new (now primary) Proxmox server. I haven’t upgraded the ZFS version since the old pool was replicated to the new disks, so the ZFS version should still be compatible with TrueNAS.

But the disk burn-in is taking place in a TrueNAS Core based server and I was also planning to perform any dd operations on the TrueNAS server (which is why I’m asking on this subforum).

oO.o · May 6, 2021, 7:37pm

Was just wondering if you were doing this in the FreeNAS GUI (indirectly) or if you were issuing cli commands yourself (directly).

Bad sectors ≠ data corruption.

Each time ZFS does anything to data, it verifies the checksum. It will never propagate corrupted data through your pool.

Yeah, but that’s more steps, and ZFS replace would be checking the integrity of the data while it’s copying to the new drive. I just don’t see why you would prefer to use dd over zfs here. There’s a tool built exactly for the situation you’re in and it’s zpool replace.

HanSolo · May 6, 2021, 8:10pm

Ah, I get ya. I’ll explain in my answer to the last paragraph.

I know. That’s why when I was replying to Trooper_ish, I was careful to always couch it in ifs… I said if the bad sectors resulted in corruption… if there was corrupted data on the single disk I was replacing.

I understand that, which is why I don’t quite understand why dd’ing the disk would be a bad idea if it was going to get scrubbed afterward.

Perhaps I should explain in more detail why I’m looking into using dd.

I have / had three pools.

tank_new - the pool in my primary server, which consists of 6x 10TB in RAIDz2.

tank_old - When I setup my new, Primary server, I replicated old_tank to my new disks and created the new_tank pool.

dozer - my “backup” pool in my backup / old server.

All three pools had copies of all my really important data. My end goal was to destroy old_tank and add it’s disks to dozer in order to have the capacity to backup new_tank in it’s entirety.

When I had the Uncorrectable Sectors appear on a disk in tank_new, I didn’t have another pool big enough to perform a full and up to date replication, so I bought more disks in order to create a new, larger dozer… this is where things get complicated. Before doing anything else, I scrubbed old_tank to make sure it was still in good shape. Then, having burned in the new disks, I recreated the dozer pool, having added the new disks to the existing one…. but process of performing the replication to the newly recreated dozer pool, one of the older disks had 1279 Uncorrectable errors appear. Cue another order for a replacement disk and days of burning in using the following procedure - Hard Drive Burn-in Testing | TrueNAS Community

So that left me with new_tank with 8 errors on the single disk. dozer with a 1279 errors on a single disk and the replacement disk ready to go. And old_tank which I scrubbed but didn’t perform and smart checks on.

The 10TB replacement disk destined for new_tank is almost finished burning in, and not wanting to place any added stress (from a resilver / parity calculations) on new_tank, I began wondering whether dd might be an option… not least in regards to dozer as it’s disks are older than those in new_tank.

The idea is that using dd would mean reading and writing from just the old and new disks for new_tank, and then performing a scrub. Where as a full resilver might be more stressful on the other disks, at a time when I’m worried that I dangerously close to having problems with me dozer pool too.

oO.o · May 6, 2021, 8:18pm

As long as new_tank/new_tank doesn’t have any issues aside from the bad sectors on the one disk, I’d still use the replace function.

Or honestly, replacing a disk with 8 bad sectors has never been urgent in my experience. You could hold off until you have a good backup somewhere.

HanSolo · May 6, 2021, 8:20pm

Would your recommendation also apply to replacing the faulty disk in the dozer pool (considering that it’s other disks are a few years old but not heavily used)?

My plan is to replace the disk in new_tank last, once I know that dozer is back up and healthy.

oO.o · May 6, 2021, 8:23pm

if there any data that is only on new_tank or only on dozer? If dozer is simply a backup of new_tank, I would replace it’s drive first (only one drive with the errors correct?). Once it’s done, replace the new_tank drive.

Yeah, exactly. I would only start being paranoid if there was data on dozer that wasn’t anywhere else.

Trooper_ish · May 6, 2021, 8:29pm

Mr dot’s is right, Acting early is not a Bad thing, but you might be prematurely removing a drive that is fine, and there may be another / intermittent issue.

You could hold off replacing a disk until the system starts rejecting it; you have a rz2, so the pool will continue.

The point I was trying to allude to, is that ZFS scrubs at a Pool level, not a disk (provider) level (or even vdev level) and it might have several copies of the data.
When doing a scrub, one might not know Which copy has been used to check.
This is especially annoying on mirrors, but not restricted to them.

I do not think you should be concerned about the extra stress placed on a rz2 pool, unless you suspect the heating might be insufficient? It;s not something one would want to do all the time, but would be the same read stress as the monthly scrub, but would just be heavier write stress on the new drive?

HanSolo · May 6, 2021, 8:36pm

Assuming the replcation from new_tank to dozer was successful despite one of dozer's disks crapping the bed with 1279 Uncorrectable Sectors, and I have no reason to beleive it wasn’t successful, then the data on new_tank and dozer should be identical given that I haven’t used either pool since performing the replication.

I can’t help worring and getting stressed about this sort of thing I’m afraid. It’s just the way my brain works. Worse still, It makes me second guess everything I’m doing… even the simple stuff. Which is why I really appreciate the help your giving me, even if some of my questions can be pretty basic.

Thank you, and thank you too @Trooper_ish.

On the bright side… atleast this whole ordeal is giving me a much needed kick up the bum to sort out a proper third backup, so that I can’t finally fulfil the requirements for the 3-2-1 Rule.

Trooper_ish · May 6, 2021, 8:41pm

You are already acting before the pool goes offline / degraded, so no drama, plenty of time.

If you were to have lost a provider entirely, then it gets more urgent.

Two providers, as you know, is when you start to be Very careful.

I’m mostly just glad that ZFS gives the option to DD and still work in a pool, and to transfer a whole pool to a different operating system, on different hardware, even virtual.

Not sure Storage Spaces could do that, nor even hardware raid

oO.o · May 6, 2021, 10:19pm

+1

You really aren’t in much of a precarious position. Fix up dozer and then swap out the drive in new_tank and you’ll be good to go. There’s no reason to think something went wrong with your replication or that any of your data is corrupted.

I would definitely stick with the zfs tools when possible. dd always opens you up to a lot more potential user error. I try to avoid using it in general because it’s so easy to catastrophically ruin something. That said, I am using it at this very moment to clone a drive, so it’s not like I never use it.

SgtAwesomesauce · May 7, 2021, 12:45am

The problem is that DD will create an inconsistent pool, unless you’re taking the entire pool offline when doing it.

If you wind up with an inconsistent pool, checksum failures will likely scrub the inconsistencies on the fly, but there’s still danger lurking.

I highly recommend using zfs replace.

zpool replace tank old_disk new_disk

From what I can see, you’ve got a Z2 pool, right? In that case, you’ll be able to sustain another “failure” and be fine.

Moreover, a DD then scrub will be just as stressful as a resilver/replace. You’re still reading all the occupied sectors on the disks.

thro · May 7, 2021, 2:35am

Resilvering is literally designed to handle this job, why would you use DD?

Resilver will only sync the actual data on the disk so it will be much faster (and deal with incoming writes as they happen) too.

dd is, as above fraught with danger if you fuck it up

oO.o · May 7, 2021, 3:44am

We already kinda went over all of this. I believe OP is onboard with zpool replace at this point instead of dd.

HanSolo · May 8, 2021, 6:41pm

@oO.o is correct, I do plan on using the ZFS native tools at this point but I do just want to pick up on a few things, Sarge, as I’m still interested in the question of whether there’s something to be said for using dd in a scenario like mine… even if I no longer plan to use it.

The pool is already offline in the sense that it’s powered off but I didn’t offline it in the ZFS sense… if that makes a difference

I will be using replace but I was planning on just doing it through the TrueNAS GUI.

That last paragraph has got me thinking… I’ve always been under the impression that a resilver is more stressful on spinning rust than just a plain old scrub, because resilvers (in the case of RAIDz*) require rebuilding the data from parity. My theory for using dd is that the data wouldn’t have to be calculated from parity, it’d just be read from the one faulty disk and written to it’s replacement… it’d be more akin to resilvering a mirrored vdev. But perhaps my assumptions are flawed and I need to do some more research in regards to resilvering.

I just wish the documentation on the higher (or should that be lower) level ZFS stuff was more accessible.

WorBlux · May 18, 2021, 5:01am

ZFS scrub shares a large part of it’s logic with resliver. It visits every block in the pool and verifies the checksum. I believe you are confusing it with normal read behavior.

" Instead of a consistency check like fsck(8), ZFS has scrub . scrub reads all data blocks stored on the pool and verifies their checksums against the known good checksums stored in the metadata. A periodic check of all the data stored on the pool ensures the recovery of any corrupted blocks before they are needed."

That said replace is still the way to go. It will scrub/resliver as you go keeping full advantage of any redundancy you have.