Proxmox ZFS Fail to Boot

I have been running proxmox as ext4 for years, but have decided to attempt to set it up again as a ZFS mirror.

In order to make sure I understand how to recover, I have a test system set up, fresh instal (I haven’t even logged in). I unplugged a drive to see what would happen…… and it doesn’t boot. It fails to load the array and waits for user input.

I am not new to ZFS (truenas user for almost a decade), but I am new to booting in a ZFS environment. Looking through documentation I am not really sure how you are supposed to fail gracefully if the thing won’t boot on its own with a lost drive, nor am I able to figure out how to get it booted to resilver onto a new drive (in this case my new drive would just be me wiping one of my original boot drives as a test).

What am I missing here.

1 Like

it should boot without any issues, UEFI installation or GRUB legacy mode?
Is the remaining disk as a boot disk configured in your Bios?
You can boot from the Proxmox iso (debug mode) and use “grub-install” if the system is in GRUB legacy mode.

I am not sure how it installed, grub or the proxmox boot environment (forgetting the name, done too much googling in the past 3 hours).

I get this if I unplug either of the two drives. If both are plugged in, boots without issue.

I have tried “ zpool import rpool” and that doesn’t work either.

I see a linux boot option in bios for one of the drives only, but I do have eufi boot options for both, which seems curious.

Perhaps the pool has a different name?

/sbin/zpool list

^ Curious if you see it listed.

zpool import -N rpool
exit

nano /etc/default/zfs

ZFS_INITRD_PRE_MOUNTROOT_SLEEP=‘5’
ZFS_INITRD_POST_MODPROBE_SLEEP=‘5’

update-initramfs -u

Hmm… Well, when in doubt, nuke it and start over I guess.

I am wondering if I had selected RAID 0 instead of 1… I just reformatted the test setup, unplugged one of the drives, and it booted just fine.

But, since this is all to learn how to do this, and actually… stick with me here, my actual reason I want to do this is:

  1. learning so I can recover in the future if I need to
  2. I am trying to go from a ext4 to a ZFS Mirror boot drive (I store all VM’s on the boot drive, its a homelab setup), and I figure if I can spin up this test system, add homelab + test system to a cluster, I can then live migrate all my VM’s and containers over. Assuming that all works, I could then remove one of the mirror drives in the test system, move a single test system drive to my homelab, and resilver it with the current homelab boot drive. So I can effectively just make this migration once, and then resilver it with the current boot drive.

A little more info… I have 2 samsung 980 nvme drives I plan to have in the mirror, but 1 of them is the current homelab boot drive for EXT4. I have a spare 500 GB SATA SSD, so I am trying to get this set up as a mirror between an nvme and a 500GB SATA drive. Then I would remove the SATA drive, add the current boot drive, resilver, and done. Make sense?

So, reason for this explanation. how do I actually go about resilvering this? I just created a mirrored boot drive, I removed one of the drives (I can now wipe it), how would I go about re-adding it? Again, just to make sure I actually know how to do this…

1 Like

Proxmox is intended for the data center, so it is worthwhile to check the documentation

https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_change_failed_dev

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#chapter_zfs

Unfortunately, thats my issue half the time. I have found a lot of the documentation to assume the reader has more experience then I do.

Step 1 would be:
proxmox-boot-tool status

From there:
sgdisk -R
sgdisk -G
zpool replace -f

What are the devices here? Drive names? /sba for example?
What is old and new zfs partition?

Then assuming I am using the proxmox boot tool:
proxmox-boot-tool format <new disk’s ESP>
proxmox-boot-tool init <new disk’s ESP>

What is the new disks ESP?

well, what else can I say other than that you have a lot to read :slight_smile:
But you have the right attitude, other people just build and don’t think about the fact that they have to support it in case there are problems.

To format and initialize a partition as synced ESP, e.g., after replacing a failed vdev in an rpool
https://pve.proxmox.com/wiki/Host_Bootloader#sysboot_proxmox_boot_setup

1 Like

Yea, some of this can just be difficult. I will figure it out… but I have got this homelab set up on the backs of Lawrence Systems, Wendell, a few friends, and a lot of trial and error and just figuring it out.

Some documentation I find easier to understand, some less so. Thus is life. I have had various forms of the lab up and running since 2015, but trying to be a little smarter with mirrored boot drives. I have great backups from PBS to my truenas array of all VM’s, but that doesn’t protect the actual proxmox host. And considering my firewall is virtual, real PITA if the entire system goes down, thus the mirrored boot drive plan.

Since Proxmox 7.x for boot using ZFS you shud by using the proxmox-boot-tool :slight_smile:
And for eache kernel install jsut for safti always do a “proxmox-boot-tool status” it will save you :slight_smile:

Soeme details
https://pve.proxmox.com/wiki/Host_Bootloader
https://pve.proxmox.com/wiki/ZFS:_Switch_Legacy-Boot_to_Proxmox_Boot_Tool