Extreme ZFS failure recovery help?

In trying to figure out why I couldn’t plug 3 WD80 drives in at once(2 are detected, which those are seems to be random) I tried formatting one of them with a LSI-9212-4i. I mistook which one it was, but I stopped the process very early, under a minute.
It’s an encrypted ZFS drive with no other pool members. I have the key for it.

Where do I start for recovering data? I still don’t know why randomly this or one other drive just won’t show up.

It sounds like you have possible hardware failure, which is never a good thing when you want to attempt data recovery. Attempt your recovery on another rig.

Are you seeing events in dmesg for the removed drives?

I’ve never done ZFS data recovery, but generally the firs tthing I’d try to do is import the pool as read only. From there see if your data is readable.

From there you use the ZDB tool to inspect the filesystem, and work out how bad it is. It can let you mount an earlier transaction id, which may allow you to reconstruct older data with the help of a failed but still half readable disk.

1 Like

Yes as @cowphrase suggests, connect the drives to a different rig. At least a different LSI card. If they are SATA then you don’t need the card.

If you’re using RAIDz then you don’t need att the drives to work. You do want somewhere to save the data once you do gain access. Buy some more drives.

nonono you misunderstand. The drive is fine, I backed up the full drive to an image and it didn’t make a fuss. I stupidly overwrote the first chunk of the disk, mistaking it for a different, non-functional disk.

There’s no raidZ, it’s just an encrypted drive that’s missing… Well, all the partition info and such, as well as probably a good chunk of data at the start of the drive. I’m not sure what I need to restore, or if it can be restored. Was going to pop the first gig or two in a hex editor, and look at a working ZFS partition to see what it expects, but I really don’t know what I’m doing there.

I really wasn’t very clear in the opening post. I confirmed the not working drive doesn’t spin, I have no idea why it’s model showed up sometimes and replaced a working drive.

Sorry about the confusing opening post, I’m trying to recover from some very big and dumb human error on my part.
It looks like the first 2~3gb or so are missing. Is it possible to recover data from an encrypted volume in that state, say by copying over data from a working encrypted volume using a different key and then loading the correct key? I can get it to show up as another disk in gparted by copying the first 400mb from another zpool, but I’m not sure really where to look to change the names, or if all the essential ZFS information on the pool is just stored at the start of the disk and is then gone.

Okay the first step is to restore the partition table. By default ZFS uses GPT, which should be your lucky day - GPT has a backup partition table in the footer!

Can you open the image in gdisk (gdisk /path/to/image), then run “p” to display partition information? Hopefully gdisk will find the backup partition table, and give you an option to restore it. IIRC you’ll need to enter the “r” menu, and find the right option. (FYI you can type “?” to display available options, and partition changes are only made when you run “w”). ((Double FYI unlike parted, gdisk only modifies the partition table, it doesn’t change any data. So if you restore the partition table the data should still be there)).

If you aren’t using GPT you’ll need to use fdisk. Unfortunately there is only one partition table, at the start of the drive. So you’ll need to create a new partition table (“o”), and create new partitions (“p”) with sizes and offsets matching the original exactly. Same as gdisk, none of this will cause data loss if the partition offsets are correct.

(Subtle side note, GPT actually stores a “shadow” MBR. So a GPT disk may appear as an MBR disk to programs that don’t know about GPT. So generally you want to open the disk in gdisk first so it can find GPT markers).

Also don’t quote me on this, but I believe ZFS stores a copy of the metadata at the end of the drive. So that hopefully won’t be affected by the potential drive wipe.

Some helpful information would be to work out how much data has been overwritten on the drive. If you were formatting it hopefully it was writing zeros? If so can you use hexdump to work out where it got upto?

Also another side note, ZFS encryption doesn’t use your passpharse to encrypt data - your passphrase instead encryptions a master key stored on the ZFS partition, So we have to hope that the master key wasn’t stored at the start of the disk.

4 Likes

You are the hero! It mounts and I can see folders. With any luck I can restore the important stuff and only lost unimportant, highly recoverable data.
I was already in the process of reaccquiring what I could from other sources, but there’s a lot of old obscure stuff I have very low confidence of ever finding again.
image
Looks like the encryption key was stored safely somewhere other than the start, or has a backup somewhere.

I used a hex editor and DD to determine a couple gigs were lost at least, more than expected, but not much more than that.

3 Likes

Good to hear, for the future could you just say what steps you needed to do - did gdisk simply find the backup partition and let you write it back to the start of the disk?

2 Likes

Yup, that was it. Loaded backup partition table and loaded backup header, from disk. After that, I could import the pool, and it mounted fine with my automount script, which is just import, load key, mount.

I’m assuming there’s a bunch of corrupted files, but I’ll see what a scrub says after I get my data.

I have noticed though that, instead of 8MB of unknown data, there’s 1.3MB of unallocated data. This used to be 8MB of unknown, so I don’t know why it’s not anymore.

2 Likes

I have not been in this position yet, but great to hear some great ideas @cowphrase and well don @alkafrazin
Might I ask, have you ever looked at using /dev/disk/by-id which should give the serial number, as well as model number?

I sometimes get similar drives, and can be a pig working out which is which when doing stuff like that.

Also, you backed up the drive to an image, was that a full drive image like with DD / gDDRescue? just wondering if you could have pulled form that source if it was before the wipe, or if after the wipe, if you were able to recreate the partition in it that you did?

Just for future posterity

But good job lads

1 Like

I backed up the partially wiped drive with DD, restored the partition table and header of the image, and then wrote the first 400mb of the image back to the drive. After that, it just kinda worked.
Still in the process of collecting files, but I was able to retrieve the whole PC games folder, so my itch.io library is safe, as is the GOG library, and some obscure older indie games that may or may not exist anymore.

2 Likes

Just updating this to say that, after performing a scrub, it appears that it’s a 100% recovery. ZFS could not find a single data error, after copying the entire drive’s file contents to another drive, and then performing a scrub on the repaired drive.

It seems GPT and ZFS really are quite good at covering for human error, even without RaidZ.

3 Likes

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.