ZFS - RaidZ3 VDev size

Thinking about a 24 drive enclosure and RaidZ3 (4GB ish drives). 3 - 7 drive vdevs or 2 - 11 drive vdevs?

Three ideas

Personally I would do (4) z2 6 drives, or a combination of pools, and not do one super volume. I would probably keep them 4 seperate vols, so it's easier to manage or move or upgrade.

(4) 6 drive z2

(2) 2x6z2

or, a cold and hot storage idea
(4) 4 drive z1 (hot)
(1) 8 drive z2 (cold)

1 Like

I will have nvme drive for iops.
8 - 600 gb intel s3500 for fast
The zfz disk is for big, archival, must not lose.

For archival stuff, your backup strategy is more important than your spare disks in raidzX... I would do my best managing, and probably documenting your archive strategy, and manual replication to other sets of drives. One big vol is a single point of failure. Having files replicated across z1s would be safer than one copy across a z2.

I read somewhere google's backup strategy follows a 3-2-1 model.
3 copies
2 locations
= 1 backup

Also I have no beef with raidz1. In my practice, I wrote a shell script that emails me zpool status and smart mon every day, so I get a heads up if a drive drops out. Some sort of automated monitoring should be part of your archive, if your device is always on.

My issue with RaidZ is the risk of failure while rebuilding a disk. In practice, I am interacting with my machine on a daily basis so the window for two bad drives simultaneously is pretty small but 3 gives me the chance to have 2 bad drives and a 3rd die during the rebuild. Belt and suspenders. Having a good offsite backup would be a nice addition. Will have to have a look at what can be done with the cloud, encryption and, incremental backup. Some of the Chinese sites hand out 10+ terabytes of free storage per account. Restore time might be impossible at 150mbps however.

IMO rebuilding an array with new disks is basically never going to happen.... If you lose one disk, you're likely to loose another. Keeping the whole array running in full production mode after they start dropping is bad pragmatism, and realistically (or IMO) if you can't afford to replace a moderate set of drives within a decent window, your server or vol just need to sit idle until you can save your data. Backups are going to do more to save your data then multiples of drive parity.

Another consideration is your cpu utilization - drive parity multiplies cpu utilization with your zX factor. If you have z3, your multiplying your parity calculations 3 times per vol.

To this last post I pretty much disagree. Replacing an entire vdev because of a bad drive seems wasteful as heck. replacing 7 - 4TB at ~$100/drive because 1 drive has gone bad seems like a poor financial decision. Replacing 7 - 8TB drives @ $300 a pop drives seems even less responsible. If you are spending someone else's money maybe this makes sense but at $1-2k per vdev I will replace drives as required rather than prophylactically.

1 Like

The moral of my intent is having smaller arrays ( vdevs ), because management, backups, and replacing or rebuilding your storage nodes is more redundant, and less likely to fall apart.

If you have 2x7 vdev in z2, you have 4 parity calculation, plus one single array, so bitrot could take out the whole thing. Having your files copied manually, between two or three 4-5 drive arrays, you'll loose more storage replicating crucial things across multiple modules, but you'll have more copies, and less overall drive will spin up for a single file operation.

I would personally do multiple smaller vdevs.

two pools of 10 drives each in 5 drive Z1 mirrored with another 5 drive Z1.

this should give you 8 drives worth of storage + 4 drives for hot swap (you can also do a two 12 drive pools for 10 drives worth of storage)

This way you don't need to use Z2 or Z3 (mirrored vdev rebiuld fast enough and you can survive a whole z1 + 1 drive failure anyway)

1 Like

I thought this was a pretty thoughtful discussion on mirrors vs raidz2

1 Like

If its for backup, 2 vdevs

If it’s for primary use, then you’re gonna want 3 vdevs for those precious few iops.