Moving ZFS array from freenas to proxmox appears degraded

Douglas_Hodson · December 29, 2019, 8:06am

Hi All,

First time poster, I’ve decided to move my file server from freenas to proxmox. I was under the impression that proxmox’s ZFS implementation would work nicely with the array but it appears to be showing the first array as degraded.

zpool import shows the below:

   pool: array1
     id: 15782512880016547313
  state: DEGRADED
 status: The pool was last accessed by another system.
 action: The pool can be imported despite missing or damaged devices.  The
        fault tolerance of the pool may be compromised if imported.
   see: <<URL REMOVED>>
 config:

        array1                                    DEGRADED
          raidz1-0                                DEGRADED
            sde                                   ONLINE
            sdd                                   ONLINE
            sdc                                   ONLINE
            7ab82888-1cc5-11ea-a43d-001b212010a0  UNAVAIL

The missing disk is sda on the freenas installation and sdb in proxmox.

lsblk shows the disk is correctly presenting, but shows no partitions or the partition table is damaged.

sdb                  8:16   0   3.7T  0 disk

sdc                  8:32   0   3.7T  0 disk
├─sdc1               8:33   0     2G  0 part
└─sdc2               8:34   0   3.7T  0 part

sdd                  8:48   0   3.7T  0 disk
├─sdd1               8:49   0     2G  0 part
└─sdd2               8:50   0   3.7T  0 part

sde                  8:64   0   3.7T  0 disk
├─sde1               8:65   0     2G  0 part
└─sde2               8:66   0   3.7T  0 part

I understand that freenas moves it’s own boot information on to the first disk after the first pool is created? Could this be the cause of what I’m seeing here?

Should I format /dev/sdb/ and rebuild the array in proxmox or return to the old freenas installation (still on a USB on my desk) and make some changes there?

I wasn’t able to find any information with previous searching, which I thought was weird considering I imagine this would be a fairly common move.

PS. I used
dd if=/dev/sdb of=/dev/zero bs=100M count=100 to confirm I’m able to read from the disk and it works correctly. Not sure how to do a write test without potentially corrupting the data.

Trooper_ish · December 30, 2019, 4:58pm

Well, if there were remnants of the old boot sector before, zeroing the first 100mb might have removed that as an option…
Did the pool export from the Freenas install okay?
I would say yes, go back into Freenas, rebuild the array, then export it.
The system might refuse to online the array, because Freenas will also see it as degraded, but you would be able to force it if it won’t resilver.

I thought FreeNas pout the boot partition on every drive in the first pool, if you don’t have a separate drive/usb for the install, but I guess it might have only been one?

You might try recovering the partition table on the cleared drive, but might be simply easier to resilver it, if you can boot into FreeNas and fix the pool

Edit, got FreeNas and proxmox wrong way round

Douglas_Hodson · December 31, 2019, 12:30am

Hi Trooper,

Thanks for the reply here. I only read from the drive, I’ve not written over it.
In freeNAS the array appears completely healthy which is the odd part, it also means I can’t recover the array in freenas first.

FreeNAS is installed in a separate USB, the drives were not installed in the machine when freeNAS was configured.

Zavar · December 31, 2019, 12:47am

As stupid as it sounds, could you try using a different SATA cable or port, if you’re not using an HBA card or backplane?

You could also try moving the system dataset off that pool, if it’s on there.

https://www.ixsystems.com/documentation/freenas/11.2/system.html#system-dataset

Douglas_Hodson · December 31, 2019, 6:34am

funnily enough, I’ve already swapped the HBA after a failure. I just assumed that the fault was with freenas writing odd data to the disk as the pool works perfectly in freenas but not in proxmox.

Unfortunately it appears I’ve completely borked the network config on the freenas install after trying to move it on to a different subnet. I’m no longer able to see the web portal to review further.

I might try a fresh install of freenas and see if the pool imports there cleanly.

Is there a way to do a “soft” resilver that only repairs the damaged sectors and just confirms that the rest of the drive is as expected without a full rewrite to the drive?

oO.o · December 31, 2019, 5:18pm

Can you run along smart test on that drive?

I believe resilvering behaves the way you’ve described by default, but since Debian isn’t seeing any partitions at all, it will probably overwrite the whole thing.

I don’t think freenas does anything with the boot sector on the storage drives. That should be contained to whatever the OS is installed on.

Also, just in case you or anyone else was wondering, those 2GB partitions are freenas swap. They won’t be used by Debian, but they shouldn’t cause any problems either.

Douglas_Hodson · January 1, 2020, 9:28am

smartctl --all /dev/sdb output is below

=== START OF INFORMATION SECTION ===
Vendor: HPT
Product: VD4-0
Revision: 4.00
Compliance: SPC-3
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Logical block size: 512 bytes
Physical block size: 1048576 bytes
Lowest aligned LBA: 5161
Formatted with type 2 protection
4 protection information intervals per logical block
32 bytes of protection information per logical block
Logical block provisioning enabled, LBPRZ=0
Logical Unit id: 0x00193c0000000000
Serial number:
Device type: disk
Local Time is: Wed Jan 1 19:22:42 2020 AEST
SMART support is: Unavailable - device lacks SMART capability.

=== START OF READ SMART DATA SECTION ===
Current Drive Temperature: 0 C
Drive Trip Temperature: 0 C

Error Counter logging not supported

[GLTSD (Global Logging Target Save Disable) set. Enable Save with ‘-S on’]
Device does not support Self Test logging

The drive appears to be healthy with reads but I believe the HighPoint HBA is making SMART tests impossible, no other drives behind the HBA appear to support SMART testing.

I’m really keen to get this array up and running. Can somebody give me an idea of what a good method to try is even if it’s not garunteed to be the perfect one?

I’m leaning towards creating a new freenas install, importing the array and then exporting it and trying that. If I do a resilver in proxmox, how do I configure it to keep the 2Gb partitions exactly the same as the other drives from freenas?

The data on this drive is only TV media, but it’s also my only copy of it. So a data loss would mean a few months of re downloading, but nothing permanently lost.

Trooper_ish · January 1, 2020, 10:42am

Yeah, try plugging drive direct to motherboard?

Douglas_Hodson · January 2, 2020, 1:06pm

Took a while but I got a chance to have a go at this.
I connected the entire array directly to the motherboard.
All drives present as expected in proxmox.
Moved over to freenas, the array still presents correctly and I was able to navigate through the array and read data. As far as freeNAS is concerned, there’s nothing wrong with the array, this is both connected to the HBA and the motherboard.

Is my best move to format the drive in proxmox and rebuild it?
What efforts should I do to ensure the partitions are configured the same as the other disks?

Trooper_ish · January 2, 2020, 1:35pm

Okay, hold on a sec.
When you say all drives present as expected, does this mean that
zpool status
Reports the pool as online, and all disks available?
Or is the pool in degrades state, but accessible?

If online+available, then the drive does not need formatting.
I’d degraded(and lists one device unavailable) then maybe just use zpool to “replace” the “missing” drive with the one you want?

Zpool can handle all the formatting etc needed

Trooper_ish · January 2, 2020, 1:44pm

Oh, and rather than use /dev/sdb or whatever, check out using /dev/disk/by-uuid/

If you do a ls -l you can see the drives, with their current location, so switching system avoids shuffling drive letter.

The format is bus-model-serial-partition.
Ignore and wwn, that is a duplicate list of the drives above.

When telling zpool any device by [Edit: I mean /dev/disk/by-id, not by-uuid,] often you can use just the id, like instead of zpool create tank/dev/sda, instead you could put zpool create tank ata-wdc-wd40000abx-ahdhdhdjjs

And then drive can be identified by the serial number (often on a sticker on the end, or on the top)

Douglas_Hodson · January 2, 2020, 2:21pm

In freeNAS zpool status says there are no degraded disks and there is no fault.
zpool import in proxmox says that the array is degraded with a single disk missing.
This missing disk is always 7ab82888-1cc5-11ea-a43d-001b212010a0.

This missing array was on channel one in the SAS controller on freeNAS.

The drive also presents with a valid partition table in freeNAS, but not in proxmox.

Should I import the array with zpool import -f array1 and then repair it with zpool replace to repair the array? Do you think this will require a full resilver that could potentially put me at risk of data loss? If the drive were to fail mid-resilver, are the remaining drives still in a state that would allow for recovery?

Trooper_ish · January 2, 2020, 3:22pm

Well, if FreeNas can see all the drives, I would try a zpool export, and zpool import.
If import complains about pool in degraded state, I would remove the zpool cache file, then try and import again.
If it still doesn’t like it, then import with a -f, and yes, “replace” the 7ab8xxxxxxxxxxxxxx drive with the ata-WDC-wd4000efra-serialnumber into proxmox.
It might be that the system just needs a cleaned cache, and I don’t think zpool clear array1 would clear that bit.

Presuming you have a raidz (raidz1 or better) your data currently has no redundancy, so if a data/parity drive goes, the pool is hosed/unrecoverable, but is still currently safer than just having a single drive.

If you do have to replace, if one of the other drives develops a fault, it might Take the whole lot with it, but this is the same as if a drive completely died- so nothing especially dangerous, but not a situation you want to stay in.

Trooper_ish · January 2, 2020, 3:25pm

The cache is a list of known pool configurations, and member disks.
This is to speed up boot times, so it doesn’t have to poll every partition to see if it is a member of any pool, etc.

If you dual boot distro’s, and come back to one after a pool change, the system can expect a set of drives, but if changed on another system, get confused, and offline the whole pool.

system · October 2, 2020, 9:25am

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.