Need help recovering a zfs pool without the metadata

TLDR:

  1. Is there a way to force zfs to import the pool without metadata device and rebuild the metadata in the data vdev?
  2. Is there a software solution like UFS raid recovery that I can use but that has a customer support number and contact info?
  3. Are there programs that are designed for recovering zfs arrays without the metadata?

Background:

On December 24th last year, a truenas scale server that I set up for my parents died. and it took the array with it. When I checked for backups that they said they would set up, there were none.

Server configuration:

The server was set up in the following config: 8x 4tb white label (HGST, Toshiba according to smart data) in raidz3, 1x Samsung 970 evo for read cache, 2x 128gb Lexar sata ssd mirrored for boot pool, 2x silicon power 128gb ssd in a mirror for the slog. It turns out that I set it for special metadata instead (insert facepalm here).

Events:

The server was created in early 2021 using Freenas with everything but the 2x silicon power SSD. In about May of 2022, I upgraded to truenas scale to standardize the operating systems that I maintained. (I also ran a nas for myself running truenas scale. My server was the guinea pig for the updates before I updated my parents nas) in November 2023 I added the 2x silicon power SSDā€™s to address issues with large file uploads starting fast but then dropping to 20mb/sec and to address severe latency when working off of files directly on the NAS.

Now, fast forward to Dec 24, and about a day after I upgraded to 22.02.4 the server dies. It started in the morning with a stick of ram dying. Replaced the ram and then the server will not boot due to a bmc error caused by a dead drive. The drive that died was one of the silicon power SSD. that is fine it is in a mirror and I have 2 480gb intel Optanes that I was going to replace them with anyways. Server boots just fine but the zfs pool does not exist. Do not touch anything and reboot. Now it sees the pool but the pool needs to be imported however I get the error ā€œcan not import ā€˜pool1ā€™ destroy and recreate pool from backup sourceā€. As all disks can be seen I tell truenas to perform a long smart test on all drives (after I reconnect the dead SP drive) no smart errors are detected. I also run smartmontools and get the same result. The only oddity is the dead SP drive takes about 5 min to just read smart data. After a week of trying various zpool and zdb (read-only commands I did not do anything to write to the array), I gave up using Truenasā€™s zfs clone and I take the defective nas back to my lair and convert the sc826 chassis to a disk shelf and start with ubuntu zfs fuse utilities and smartmontools to see if I can get any helpful info. NOPE. All drives report back as normal.

I remove all drives except for data and start running UFS raid recovery (windows edition. I purchased the wrong version but they would not respond to me when I asked to exchange it for a Linux license.) within a week (30hr for the scan (it crashes after the 38 percent mark) and about 250hr to build the file system) I get about 65-70% of the data back. However, I noticed that this tool does not seem to recover files that are more than 10 levels deep, and the rest of the data also uses a lot of memory. I upgrade my memory from 96 GB to 320 GB thanks to a deal on eBay and now we can get past 38% however between power outages and random Windows reboots it takes about 4 weeks to do a full scan (160hr) and build a file system (250 hr) only two attempts have yielded any data the others have come back with the error of ā€œno files found so farā€. If wanted I can elaborate further.

Questions:

My parents are needing files that are a part of the uncovered files. The data disks are fine according to zpool import. It is just missing the metadata drives. However, no matter what I have tried I have not been able to force an import for this pool.

Thank you for reading this, I welcome any information you all have. there is a lot of things i have learned, done and tried over the last couple of months, however i am already concerned that i have put too much text out here and i do not want to write a JRR tolkien style post.

Ps.
I learned a few days ago that I could have used a pc3K to image one of the metadata drives to a working drive. I am looking for the other metadata drive however I am not hopeful it will help. Ttruenas reported that there was ~11gb of used space on the metadata pool but the working metadata device only reported ~2gb of used space when I used the df tool.

On mobile with limited time, so canā€™t properly read everything right now so apologies if I misunderstand something

When ZFS uses special vdevs for metadata, it moves all metadata to those vdevs. If the disks for those special vdevs are gone, then all metadata is lost. The metadata is what tells ZFS what the blocks on the data vdevs belong to, so if metadata is lost then the pool is lost with no way of recovery that I know of, as the data blocks are essentially nothing more than random numbers with no index or correlation.

If your special vdev drives were a mirror, then if you imaged one of those drives then it should be possible to recover fully if the data vdevs are fine. With ZFS, you can give it disks, or just straight up files, so itā€™s possible to dump raw disk data into a file on another drive, hand ZFS the files, and import it as a working (albeit slower) pool, for safer recovery away from the surviving disks. Useful if itā€™ll all fit on a spare TB disk.

If the drives in the special vdev were ā€œstripedā€, and only one drive was imaged, I suppose it may be theoretically possible to recover some things, but this is far beyond my technical ability and experience and youā€™ll have to get guidance from someone experienced with this kind of thing. It may require building recovery tools ZFS currently does have, like when Wendell and Mathew Ahrens have Linus recover from his pool failure.

One thing that may be helpful for understanding is making a picture/list of how the disks and vdevs in the pool where laid out, and how it has changed.

Another place to post is: https://zfsonlinux.topicbox.com/groups/zfs-discuss, which is basically a browser accessible mailing list for ZFS.

1 Like

zpool import ā€“m

by default importing is disabled without a log, but you can force itā€¦
as to how safe this will be for your dataā€¦ i have no idea. so use with caution

I donā€™t proclaim to be a ZFS expert, but arenā€™t you referring to importing with the lack of a slog as opposed to missing a metadata vdev?

I didnā€™t think you could recover from the metadata going down, hence why the special metadata device is very often mirrored.

Hi Log,
i appologise for the delay in responding, internet was out due to the local storms.

the layout of the original pool is as follows

SM x11-ssl-f SATA HBA

Metadata cache (this was supposed to be zil but i misclicked)
(this was created in november)
    2x Silicon power 128 gb ssd (mirror)

Boot Pool
this was "OEM"
    2x lexar 128gb ssd (mirror)

2port SM HBA
data pool
8x white label 4tb sata 3 HDD (raid z3)

single port nvme adapter
cache (read)
1x samsung 970 evo 256gb

L2ARC
64gb ddr4 2600 ecc

as for the fates of the metadata ssdā€™s
the one that was bareley responding to smart was destroyed as a part of spring cleaning a few weeks ago. (i did not know about the PC3K being able to potentially extract data from ā€œdead drivesā€ until about 3 days ago)

the other drive is MIA, i am not sure where it went. i put it on a shelf at the end of December ā€œjust in caseā€ but then i had to move in march and now i cant find itā€¦

here is a screenshot of output from sudo zpool list
image

Hi twin_savage
no, i am not refering to a missing SLOG vdev. i intended to create a vdev for SLOG however i misclicked and did not double check the config before i applied changes. that is how i ended up with a Metadata vdev instead of a SLOG vdev.

on a my test server i have been able to reimport a zpool with out SLOG or Cache but you loose any data that was inflight.

I tried that. the actual command was sudo zpool import -f -F -m and also sudo zpool import -f -m. neither one worked.

1 Like