ZFS Data Recovery - Logan Got His Groove Back | Tek Syndicate

wendell · July 23, 2015, 4:01pm

This video is about a disk crash in Logan's travel/editing NAS . It turns out this is also where he kept the latest copies of the new album!

So we were in a situation where we had 2/3 disk failures, totally unmountable zfs, everything is on fire, and the data is gone. Fortunately we're nerds, and we turned it up to 11 to try and get everything back.

The first problem we had was a hardware problem. A dual disk crash so severe that we had to send it off to recovery specialists. Our friends at Gillware were able to get us going again, though one of our disks was a lost cause:

https://gillware.com?referral=17296

Big thank-you to them for their assistance. If you haven’t seen the burnisher in action, take a look: https://blog.gillware.com/data-recovery/data-recovery-101-burnishing-platters

Once we got the drives operational, it became a challenge to actually extract the data from ZFS.

If you have a damaged ZFS pool stop everything right now. You may need to offline your pool and clone the constituent disks in order to be able to do these things.

It is very unlikely that you have a pool that is actually damaged enough to warrant the hackery you see in this video. If your data is important, please consult experts.

The exercises you see on the video come from a clone of the zfs pool made with DDRescue. Learn more about cloning the disks in your pool here:

https://www.youtube.com/watch?v=ddrPnuvFV6E

The fact that we’re using clones means that if (when) we screw up, we can go back to the original drives and try again. This is substantially less convenient if you have a large number of disks.

Some of the commands do here seem to operate in a read-only mode but from looking at the source code I can’t be sure that manipulating the transaction log pointers during a mount are not messing with some of the meta data, so please be careful with that.

Our setup was a relatively simple 3 drive RaidZ1 pool where one drive had failed spectacularly and ddrescue was reporting about 46 read errors on one of the two remaining disks.

After doing numerous experiments, echoing stuff into /proc/ to turn off verifying metadata and a lot of other fun exercises, we decided that using zdb was the best course of action. Unfortunately, zdb crashes. A lot. Even with the –AAAAAAA commands (which are supposed to bypass safety checks) we needed to update the zdb source code to continue.

Once we were able to get zdb to dump a reasonably complete list of zdb entries, we were able to spot entries that were below corrupt (Input/Output error) directory entries. For example /mp3 was completely inaccessible, but we could use zdb to see files that were located under the /mp3/whatever directory.

Once we could see them, extracting them was a problem. Turns out zdb allows you to dump file blocks, but not individual entries. Does anyone know a command for this? It seems weird there wouldn’t be a command for this built-in. The problem is perhaps that I can’t read.

Anyway, with a good bit help from a friend, we were able to construct some perl scripts to extract each chunk, and then reassemble each chunk back into a usable file.

Do you have a ZFS war story? if so, come over to the forums and let's swap stories of triumph and defeat. If anything our experience in this instance only makes us love ZFS more.

This is a companion discussion topic for the original entry at https://teksyndicate.com/videos/zfs-data-recovery-logan-got-his-groove-back

Alexguitar · July 23, 2015, 6:19pm

I'm not sure why you pasted a link to a copy of your video. This one happens to be slowed down and flipped/rotated. The same channel has a few more of your videos like that, not sure what's up with that. Should probably report it, but that's your call.

Streetguru · July 23, 2015, 6:19pm

I'm not going to understand half your words, but I'll watch it anyways.

wendell · July 23, 2015, 7:27pm

lol, I am a dumbass. thanks.

Krazyblackdragon · July 23, 2015, 8:50pm

Linux and Wendell are looking cooler and cooler but all those codes are unkind to my brain!!!

Is there a list of commands just by typing help? does it working kinda like dos, sorry im newb linux...

I installed it once in 1999 and I couldn't get my internet to work lol so I dumpedt it...

Windows 10 vs Linux vs Apple Who's gonna be more powerful!!!

Alexguitar · July 23, 2015, 9:17pm

Well if you double tap tab you'll get a list of all global binaries. On my system that's 4k entries so I wouldn't recommend doing that lol. Honestly, the best way is to just do trial and error, if the need arises for a specialized tool just google it.

Here's a list of tools you may want to check out first, though (these will be tipically installed by default on most distros):

cd
pwd
ls
lsblk
some text editor like : vim, emacs, nano
cat
grep
find
less / more
htop
kill / pkill
man -- this one will be very important once you get a hang of everything. It's basically a handy dandy reference for the things you have on your system. An application can have it's own manual entries there so if you type man grep you'll be displayed a list of commands and switches you can use for grep.

sam_vde · July 23, 2015, 10:30pm

The most important takeaway from this video: invest in your data if you can't afford to lose it (seen the HW resources available to TS: shame on you!)

if you are on a budget, buy slightly less expensive drives (e.g. WD REDs instead of HGST) but buy one more so you can mirror your data or do RAID6/RAIDZ2
raid does not replace backups: the cost of one drive gets you unlimited online backup for a year these days, the cost of getting your drives repaired by such a specialist company buys you a decent second NAS for in house replication. Be proactive guys, really.

Oh, and another takeaway: Linux rocks :-)

freqlabs · July 24, 2015, 12:14am

@wendell Did you find that you were able to refer to the most recent TXGs for the majority of the files and only dig back deep for the most recently modified data? Or did the disk failure just thrash all over the place?

wendell · July 24, 2015, 1:22am

some top level entries were damaged, and older txgs did not help much. For some strange reason all the snapshots were gone, too, which I was hoping was itself not copy-on-write since it was metadata. However, the snapshots could not be found even with very old txgs so idk what happened there.
see also:
http://www.solarisinternals.com/wiki/index.php/ZFS_forensics_scrollback_script

Krazyblackdragon · July 24, 2015, 2:27am

Cool Alexguitar

I will probably install some linux to learn some more,

Jeol · July 24, 2015, 4:58am

Actually, the video about the burnishing tool was actually cool, lol. The guy sounds like he knows what he's talking about, though.

Hacking to make things work is cool. Source code, especially for essential system components like this, seems very "stay away and hope not to get burned" but it's very interesting to see it modified in order to save stuff. Hopefully you won't run into any more issues with all the additional 7-8 drives. If not, then hey! More excitement digging around with zdb.

MegahurtZ · July 24, 2015, 7:15pm

Good to know that this is possible, Hopefully I will never have to do this with my ZFS though....lol

LordXenu · July 24, 2015, 8:47pm

K thanks, I'll stick to raid 0.

freqlabs · July 24, 2015, 9:28pm

Keep in mind, this recovery was only possible because of ZFS. The problem was disk failure, not ZFS failure. Because ZFS is a Copy on Write (COW) filesystem, there were older copies of blocks that could be used to recover the files. Raid doesn't do this, NTFS, doesn't do this, ext2/3/4 don't do this. With any regular raid and just some standard filesystem on top, the disk failures would have killed any file that had blocks corrupted, because there would be no older copies of data to recover from. Even with mirrors, raid nor the filesystem know nothing about what the data is supposed to be, so if one disk in the mirror is spitting out corrupted data, there's no check that fails and says "well that can't be right, read from the other disk instead." ZFS on the other hand uses checksums, so when one disk gives back the wrong data, ZFS can say "well that's clearly incorrect, read that block from the other disk." Mirroring instead of a striping with parity configuration would let you manually recover by removing the failed disk, but having a three way mirror gets pretty expensive, especially if you ever want to expand your storage.

MegahurtZ · July 24, 2015, 9:35pm

I have been using ZFS for a while now and it has been amazing. I am just spit balling this here but it would be kinda cool to see a RAID controller card that uses ARM processor and laptop RAM to make a Hardware/Software hybrid. Probably not an easy feat especially with RAM requirements but it would be cool.

freqlabs · July 24, 2015, 10:04pm

that's basically how some raid cards work already.

MegahurtZ · July 24, 2015, 10:09pm

not exactly, they use completely different data storage methods, and I am really liking how ZFS does it vs traditional RAID. There would be a couple big advantages to a Hardware RaidZ card. One there would be better performance for the host system, thats a given. And if done right then it would allow you to access the raid array from a windows install, which would be good for dual booted systems. And Three it would give greater plug and play compatibility with moving raid disks between machines without having to worry about manufacturer limitations.

freqlabs · July 24, 2015, 11:12pm

Oh you mean actually using ZFS? Well you are talking about a SAN basically then. And sure you can expose volumes as block devices like that for ie giving windows a ZFS-backed storage volume, but you are losing the filesystem part of ZFS that way, which is a shame. Definitely there are hardware SANs on the market that use ZFS internally though. They are a bit beefier than some pissant ARM chip and a few laptop RAM banks of course. :) I do like the idea of somehow embedding a ZFS SAN on a PCIe card, though the practical applications for that might be a bit limited.

Eden · July 25, 2015, 12:00am

Yeah.. maybe edit the link to your actual video... https://www.youtube.com/watch?v=ddrPnuvFV6E

I was so confused.

Eden · July 25, 2015, 12:02am

BTRFS does do this though.. im not sure of any others, apart from ZFS of course.