ZFS Data Recovery - Logan Got His Groove Back | Tek Syndicate

@wendell Do you have the perl scripts on github already?
I'd looooove to take a peak.

Yep there are a few copy-on-write filesystems out there, the main ones being BTRFS, ZFS and also ReFS. Of them only ZFS works across multiple platforms and has been used and proven in production for over a decade.

1 Like

Great video! Really shows how zfs can get you out of a pinch. Generally when this type of thing happens to me, I take a couple hours and restore from backup. Neither RAID nor zfs is a substitute for backups. I have been down this road before, and however AMAZING it is to do what @wendell did, most people don't want to pay the type of cash it takes to get this work done. tek syndicate is super lucky to have Wendell and gillware at their disposal.

50% of small to medium sized businesses don't have a regularly scheduled backup process. 50% of the ones who do backup never verify. This is why we have data recovery specialists.

One thing I forgot to mention poking around the zfs codebase is that it is my impression that it assumes you will always have redundancy and never be in a situation where insufficient parity info exists.

I think it handled our multi disk failure not as well as it could have. Maybe it would be reasonable even with very large disk pools that if the redundancy is compromised it goes into read-only mode.

As it is now it just keeps on truckin into the ground in the case of this type of failure.

Hey Wendell, I haven't had to do this type of recovery with ZFS and I think you're smart enough to not let this happen again, but in hindsight after watching how you've done it, I think this was really tedious from you and that you would've had it much easier physically replicating your damaged disk on a new one and trying to reconstruct your damaged directory's structure from a sane environment, instead of recreating files sector-by-sector. Just my two cents though.

wendell: Had you researched other data recovery experts prior to settling on Gillware, and if so who?
I have one of those Seagate st3000dm001 3tb drives I want to send to a data recovery lab. Seagate has a flat rate of about $600. Gillware has a price range based on testing, as do many others and most are double the price.

The documentation for ZFS does indeed discourage configurations wirth no redundancy.

ZFS is supposed to continue operation as long as sufficient replicas exist. With the first disk failure, the pool should have continued to function in a degraded state. Of course you don't want to be forced into read-only in this state, because you're essentially dead in the water until the failed disk is replaced, and that might not be acceptable for the majority of use cases. I don't think it's fair to say ZFS should force that decision for you. If you want to stop using the pool at this point, ZFS won't stop you. If you want to keep using the pool, it won't stop you. When I/O errors on the second failing disk started popping up, the pool should have entered the faulted state, which prevents further use of the failing device. At this point, you should be forced to recover by cloning the second drive to a new drive (most likely using something like ddrescue because of the I/O errors), then recovering the pool and resilvering to populate the replacement for the first failed drive. It's a crappy situation to be in when you have more faulted disks that your configuration can tolerate, but you should be able to actually recover if the second failing disk is correctly reporting I/O errors and can be at least coaxed into letting you read the data off it and onto a good drive.

It's very curious that the pool you were working with continued to function when I/O errors popped up. It's not supposed to keep on truckin' when the pool effectively doesn't have enough fully functional drives to actually work. The only reason ZFS would do that is if the drive blatantly lied about its health. ZFS does a lot to battle lying hardware, but there's only so much it can do. If the disk nefariously masks read errors by silently retrying until it gets a good read and the data passes the checksum verification, ZFS has nothing to go off of except that your drive seems slow.

FreeNAS has an email alert system that should have notified someone when either the pool status changed from HEALTHY or any SMART errors occurred. Unfortunately, that doesn't help if there are no errors being reported by the disk. You would have been alerted when the first disk failed and the pool status changed to DEGRADED, at which point you had the opportunity to decide that the pool should be taken offline until the failed drive could be replaced (assuming FreeNAS had been properly configured for sending those status messages). What could be a useful addition to FreeNAS (or maybe it already exists, I dunno I don't personally use FreeNAS) is a plugin or config setting somewhere that optionally forces the pool read-only or offline if the state ever becomes degraded, automatically. ZFS does have a zpool property "failmode" that will control the behavior in the event of catastrophic failure, but that wouldn't seem to have helped in this case, unless perhaps it was set to continue rather than wait or panic, wait being the default.

To be clear, my intent with this post is to clarify the behavior to expect of ZFS, not to place any blame. It is unusual that Logan's pool failed in such an interesting way. It sounds like the drives must have been lying their assess off. @wendell did you happen to record the SMART errors of the second failing drive?

Yes, I feel like when things went sideways my snapshots disappeared unexpectedly. After there was uncorrectable info "ideally" the pool should go in readonly mode. I did not see anything about that kind of functionality in the codebase that I could find.

FreeNAS was setup to email me/logan but after remoting in immediately, I found one drive offline and though "no problem, we already were planning to replace this anyway, it's only temporary, let's start rsycing stuff" and then cue drive #2 failure. Literally, died within hours of each other. Now we've got RZ2 with 1 "Hotspare" that is in power state: sleep mode. Probably should have done raidz3 with a hotspare.. and we still may.. the new nas config is not 100% finalized.

No smart warning at all, though Logan did say "its acting funny" about a week prior and I actually did the online smart whatever, and a zpool scrub
with a zpool status check manually and it was green across the board. I think the drives were lying their asses off. thanks seagate.

yeah I have my system with mostly WD green drives in them but one pool of 3 3tb segates and they are my problem children. I will be replacing them as soon as I can. I was not able to do any proper testing of this but I feel like my raidz performance went down when I added them.

Used gillware for like a decade with many cases on various edge cases where I had to clean up someone else's mess. I have a lot of xp with those guys and they have always done well.

The question is if I was using pool replication and the pool didn't go read only after disk 2 failure would the remote pool have also been corrupted? Not sure. I ayncpools with rsync now to ugh which I feel better about

@wendell I just saw your video today, right after I accidentally deleted a directory on my zfs pool. I did unmount it immediately. I would really like to know where to get a copy of your perl script for file extraction. I would really appreciate it. Thank you!

@wendell Unfortunately solarisinternals.com now hosts a parked site with advertising. It is no more.

This help you out any? https://web.archive.org/web/20150223043525/http://www.solarisinternals.com/wiki/index.php/Solaris_Internals_and_Performance_FAQ