TLDR:
- Is there a way to force zfs to import the pool without metadata device and rebuild the metadata in the data vdev?
- Is there a software solution like UFS raid recovery that I can use but that has a customer support number and contact info?
- Are there programs that are designed for recovering zfs arrays without the metadata?
Background:
On December 24th last year, a truenas scale server that I set up for my parents died. and it took the array with it. When I checked for backups that they said they would set up, there were none.
Server configuration:
The server was set up in the following config: 8x 4tb white label (HGST, Toshiba according to smart data) in raidz3, 1x Samsung 970 evo for read cache, 2x 128gb Lexar sata ssd mirrored for boot pool, 2x silicon power 128gb ssd in a mirror for the slog. It turns out that I set it for special metadata instead (insert facepalm here).
Events:
The server was created in early 2021 using Freenas with everything but the 2x silicon power SSD. In about May of 2022, I upgraded to truenas scale to standardize the operating systems that I maintained. (I also ran a nas for myself running truenas scale. My server was the guinea pig for the updates before I updated my parents nas) in November 2023 I added the 2x silicon power SSDās to address issues with large file uploads starting fast but then dropping to 20mb/sec and to address severe latency when working off of files directly on the NAS.
Now, fast forward to Dec 24, and about a day after I upgraded to 22.02.4 the server dies. It started in the morning with a stick of ram dying. Replaced the ram and then the server will not boot due to a bmc error caused by a dead drive. The drive that died was one of the silicon power SSD. that is fine it is in a mirror and I have 2 480gb intel Optanes that I was going to replace them with anyways. Server boots just fine but the zfs pool does not exist. Do not touch anything and reboot. Now it sees the pool but the pool needs to be imported however I get the error ācan not import āpool1ā destroy and recreate pool from backup sourceā. As all disks can be seen I tell truenas to perform a long smart test on all drives (after I reconnect the dead SP drive) no smart errors are detected. I also run smartmontools and get the same result. The only oddity is the dead SP drive takes about 5 min to just read smart data. After a week of trying various zpool and zdb (read-only commands I did not do anything to write to the array), I gave up using Truenasās zfs clone and I take the defective nas back to my lair and convert the sc826 chassis to a disk shelf and start with ubuntu zfs fuse utilities and smartmontools to see if I can get any helpful info. NOPE. All drives report back as normal.
I remove all drives except for data and start running UFS raid recovery (windows edition. I purchased the wrong version but they would not respond to me when I asked to exchange it for a Linux license.) within a week (30hr for the scan (it crashes after the 38 percent mark) and about 250hr to build the file system) I get about 65-70% of the data back. However, I noticed that this tool does not seem to recover files that are more than 10 levels deep, and the rest of the data also uses a lot of memory. I upgrade my memory from 96 GB to 320 GB thanks to a deal on eBay and now we can get past 38% however between power outages and random Windows reboots it takes about 4 weeks to do a full scan (160hr) and build a file system (250 hr) only two attempts have yielded any data the others have come back with the error of āno files found so farā. If wanted I can elaborate further.
Questions:
My parents are needing files that are a part of the uncovered files. The data disks are fine according to zpool import. It is just missing the metadata drives. However, no matter what I have tried I have not been able to force an import for this pool.
Thank you for reading this, I welcome any information you all have. there is a lot of things i have learned, done and tried over the last couple of months, however i am already concerned that i have put too much text out here and i do not want to write a JRR tolkien style post.
Ps.
I learned a few days ago that I could have used a pc3K to image one of the metadata drives to a working drive. I am looking for the other metadata drive however I am not hopeful it will help. Ttruenas reported that there was ~11gb of used space on the metadata pool but the working metadata device only reported ~2gb of used space when I used the df tool.