Help me finetune my understanding of 'metadata vdev' in ZFS and disaster recovery of a stripe mirror vdev

teslaguy · August 18, 2022, 2:46pm

Looking for some advice and knowlege validation about ZFS metadata and disaster recovery for my new build below.

hardware:

1TB PCIe 4.0 nvme (3500mb/s rw - passthru to TrueNAS VM)
4x 18TB drives
TrueNAS Scale (running as a proxmox VM)
SAS3008 HBA (passthrough from proxmox to VM)

My workload looks like:

Proxmox VMs
Filestorage

My primary goal is to leverage the NVME drive to its full potential to speed up my entire storage array/workloads. As I understand it, as long as I setup ZFS stripe data vdevs I can easily add or remove metadata vdevs as I please and I was able to validate this on a test VM.

However, I am not sure what a disaster scenario looks like when the metadata NVME drive dies unexpectedly. Would my data be safe? I am going to setup a stripe of mirror vdevs - so logically it would be:

data

vdev1 (stripe)
—> 18TB
—> 18TB mirror
vdev2 (stripe)
—> 18TB
—> 18TB mirror
nvme_vdev
—> metadata + slog/cache (possibly take the 1TB and split into partitions to get two separate vdev’s out of a single physical nvme device)

Thanks in advance!

Trooper_ish · August 18, 2022, 3:01pm

My thoughts

Please do go ahead and test on your VM set-up, and let us know…
But…

I’m pretty When the NVMe drive dies, the pool dies.

And even though the NAND chips might each be able to write in the petabyte levels before locking read only, there are other components that break/fail/wear out.

But it’s a measured risk, and just having another pool as a backup, might mean you only loose a couple days files when you loose the entire pool.
The data /surviving drives can be placed into a new pool, and the backup copied over to them to start again.

The slog cache can die, and the pool will operate. A L2ARC cache can die and not kill the pool. But a Vdev brings it down.

But a measured risk might still be acceptable; your phone only has a single flash chip for it’s OS storage.
Laptops, even desktops typically don’t have redundancy.
Redundancy does not equal backup, and any device has a measured risk of failing. Even a redundant set-up might loose a backplane, or PSU or mobo.

Redundancy is more about up-time while broken components are swapped out

teslaguy · August 20, 2022, 12:27am

Thanks - I had missed that bit.

Since I don’t have more m.2 nvme SSDs - perhaps I may change my strategy by configuring TrueNAS to backup hourly and keep a clone pool of my primary pool.

Making my primary pool with zero redundancy spinning rust drives + nvme medata + ZIL + slog caches. Then the backup pool is 2x 18TB disks.