IMPENDING FAILURE! Drive in ZFS pool doomed to fail

Hey y’all!

So the day has finally come… One of the drives in my proxmox system is giving it up to the ghost. I got a seek error rate too high message from my proxmox the other day. I’ve got a zpool of 5 8tb spinning rust in a raidz1 and I’m looking for some guidance. I’ve been thinking I might copy everything to a external backup and redo my zpool in a two 3 drive raidz1 as I have a few extra drives I can through in. Its just for my media library and I have back ups of the most important stuff. I ask you bestow upon me your knowledge!

If you have extra drives, do the following:

  • mark the failing drive as failed and remove from the pool and system
  • install 3 drives, create a new raidz1 pool on it
  • copy over all data from the failed pool to the new one
  • delete the old pool
  • create a new 2nd raidz1 pool and attach this to the existing pool
  • let ZFS rebuild the 2nd pool

HTH!

3 Likes

Thank for the advice @Dutch_Master! This seems like the best route. It might even improve performance by adding a second raid group.

now would be a good time to also consider mirrored pairs. while there is a higher loss of storage space. being able to add a mirror of 2 drives to the pool is easy and rebuild times become very fast. plus raidz1 is not really recommended for multi-tb drives anymore. let alone multiple raidz1 arrays in a pool.

1 Like

Okay, this is good to know! That will make this a whole lot easier then. Do you have any guides or documentation you can recommend to help me along?

Also, would a snapshot plus zfs send/recv be the proper way to go about doing this, or is there a better more correct approach?

Please forgive my ignorance, I’m new to zfs and I’m still learning.

the process is weirdly cumbersome and nearly impossible via the GUI

there are several guides and youtube videos on the web. short set of steps that worked for me, as all the guides do it a little different.

zpool create tankname mirror sda sdb mirror sdc sdd

then i exported the pool and imported the pool with -by-id

zpool export tankname
zpool import -d /dev/disk/by-idtankname

then if you did NONE of this in the gui, it might not show up in the storage section of the proxmox gui, you will have to google that as a seperate todo item, i don’t remember those steps at all.

1 Like

This should be done regardless of what route you take with your zfs pool!

3 Likes

This doesn’t help you now, but if you consider mirrors, it super easy to expand the pool at great cost to space. (but read are so responsive and noticably faster for VM’s, or maybe I just tell myself that).

BUT, when I had an 18 TB drive fail 2 weeks ago, I bought a replacement from server part deals, popped it in my 4u case, ran a long SMART test (30 hours), the added it to the degraded pool. Once it resilvered (18 hours), I removed the old drive from the pool. Fussed around a bit with it, sent the SMART results to serverpartdeals customer service. And because this thread reminded me, I just used pirate ship to print a label and schedule pickup of the drive.

A mirrored array was super simple to deal with 1 failure. It would have been easy with raidz1 or 2. My only downtime was due to not having hotswap bays in the Rosewill RSV-L4500U case. The drive cages on that thing really are little finger pinchers. The wife still had Plex, photos still backed up to nextcloud and nobody was the wiser. BUT the nice part was: while I was in there I shucked 2 14tb drives to add to the pool as mirrors. So the expansion was pretty simple too. Though my sas card only hold 8 drives (all mine are SATA) and I’m now at a total of 7 (2x12, 2x14, 2x18, 1x1 TB for VM running docker and all my LXC containers). I have just enough room to add a mirror for that SSD. Maybe it’s time to go for the 14tb dual actuator drives in a raidz2, but my wallet says San Diego this year, more drives I don’t need next year.

It was also the first drive from server part deals to fail out of 6 18tb drives purchased summer of 2023.

Lots of rambling to say a hard drive failure was a non-issue. And easy to recover with no real downtime for the spouse.

1 Like

Okay, so export/import over snapshot? Seems pretty straight forward. I’ll do a some more research just to be thorough, but this is super helpful! Thanks!!!

You’re right, it’s shameful of me that I hadn’t setup a script or cronjob for backups sooner. I wasn’t sure that my current setup was going to be permanent, so the task fell to the wayside, but I guess it’s best practice to setup a backup plan no matter what.

1 Like

no, these are 2 totally different items. the export import step is on the newly created ZFS pool to convert it from ‘by-interface’ to ‘by-disk-id’

to explain: in the original ‘by interface’ configuration, you you were to add more disks to the host and the linux kernel shuffled around the /dev/sdx designations, you ZFS pool would fail. in the ‘by-id’ design it does not matter what /dev/sdx the disk comes up as, ZFS will see it correctly.

1 Like

Okay, this makes a lot of sense. Thanks for clarifying that for me. I really appreciate your help, @Zedicus !!! I feel better equipped to tackle this now.

I’ll add my little story. I started out my ZFS pool with various 2 and 4 TB Desktop drives in a mirrored configuration. 2TB mirror + 4TB mirror. After a while one disk in each mirror failed. But I could continue using it and backup the remaining data.
Later I replaced each disk with a bigger one-by-one. The pool is set to autoexpand to use the additional space - What I want to say you can expand 2 way mirrorsetup by adding additional 2x disks or simply replacing the existing disks with bigger ones.

2 Likes