Stripe or RaidZ1

So… I may have gone too far this time…

Plan is to have 1 as OS + images, etc… 5 on zfs, and the SSD is for backups.

Here is the question (I’m new to ZFS):
I am optimizing for IO, as this is meant for a devops/ml workload and for me IO is super important. given that I plan to continuously back up the VMs and the raw data gets backed up externally, should i go Raidz1 or Stripe ?

Mind you, I will likely burn through a drive every 12-24 months… from experience.

Also, I plan to add more drives in the future as needed

If IOPs is what you want, then mirrors are what to choose over RAIDZ1. In this case you’d have three vdevs of mirrored pairs.

That said, be warned that ZFS continues to have issues properly utilizing high performance flash due to being heavily optimized and hard coded for HDD’s. Things are being actively worked on, but it’ll take years for the devs to hunt down all the bottlenecks and split the system-wide HDD oriented tunings to per vdev. So ultimately you’ll still be trading performance for data safety, though only you can tell if that’s tolerable for your workload or not.

3 Likes

You look at higher endurance drives? or do you mean you will fill 2tb every 1-2 years?

im ok to replace a $200 every 1-2 years. its a concession i made

1 Like

sounds like Mirror isnt the move then until bugs/approach are ironed out, would RaidZ1 be good enough to just make sure that i can tolerate a single drive fail? i would prefer to have high read over high write as a concession

RAIDZ is worse in every aspect when it comes to IOPS. Mirrors are the fastest while still having redundancy for drive failure as well as the ability for self healing.

Stripe, maybe mirror for even higher read iops, put OS stuff on the QVO.

Backup over network if you need to.

1 Like

so im hearing a contradiction since @Log mentioned that Mirror isn’t the most stable for NVME

This is for ZFS in general, not specific to mirrors.

mind educating me on why RAIDZ is worst in every aspect ?
in my understanding, on 6 drives i get theoretical 35-42gbps read, and ~6-7gbps write with a single drive redundancy.

point taken on moving the OS to an SSD, just ordered a 1tb for that purpose.

I never mentioned anything about stability. Only that performance is unlikely to reach it’s full potential compared to a setup with a different (Non-checksumming, non-COW, non-hardcoded spinning disk optimizations and assumptions) filesystem. You can probably get pretty good performance, but you’ll have to test to be sure it’s doing what you expect/need. If you have lots of RAM and require a lot of repetitive reads, then ZFS’s ARC caching can be a serious win (That’s me basically benchmarking my ARC/RAM, in an unoptimized VM).

Basically more VDEVs = More Throughput and IOPS

A mirrored pair VDEV:
-Maximizes the amount of VDEVs you have.
-Can lose one disk and still detect, but not correct errors.
-Lose the second disk and your pool dies.
-If ALL VDEVs in a pool are mirrors (not RAIDZ or DRAID) and are the same ASHIFT, then the (mirrored) vdevs can be removed. As always, any kind of VDEV can be added at any time.

A RAIDZ1 VDEV:
-Maximizes the amount of space available to use after parity is accounted for
-Can lose one disk and still detect, but not correct errors.
-Lose the second disk and your pool dies.
-The more disks in a RAIDZ vdev, the more disks there are to fail. Because RAIDZ1 can still only lose 1 disk before the next one causes pool failure, mirrors are actually statistically safer since they have the least amount of parts to have to rely on.
-More computation required. This is usually not that big a deal.
-While RAIDZ throughput can somewhat scale with the amount of data disks, the IOPS will basically be that of a single disk.
-All additions of RAIDZ VDEVs are permanent and cannot be changed, (and from then on prevent removal of mirror VDEVS as well). As always, any kind of VDEV can be added at any time.

It should also be noted that there is no such thing as a “stripe” in ZFS in the way people normally use the term like with standard RAID levels. The term exists, but it’s a different sort of concept. If you are referring to the ZFS equivalent of RAID0, that would be making VDEVs with only single disk. Data in this setup can be checked for errors, but not corrected. This is actually viable but only if one sets up a solid and tested snapshot system set to a backup and has a workload that can graciously go back to a previous point and begin work again. Otherwise it’s very much not recommended.

Basically, you probably want 3 VDEVs of mirrored pairs, rather than 2 VDEVS of 3 disk RAIDZ1, because more VDEVs means more performance for IOPs.

1 Like

this helps a lot, thank you.

Then again, assuming you don’t care about data safety/durability the 3 raid1 vdevs should give you similar random read iops as 6 single vdevs.

Thinking about CXL and remote ram and big big caches… @sectorix, how big is your training dataset and how big is your model (rough layer count or param count)

Run striped and keep a good backup is what pappy always said! Life’s too short and I wanna go fast!!

I agree. But if you want ZFS with data integrity and all it’s features, you gotta make compromises. It’s a tradeoff. I run both ZFS on my server and BTRFS RAID0 for faster needs and less important stuff. Striped drives just can’t repair any corruption.

Parity RAID (RAID5 or RAIDZ) is always chosen for the storage efficiency aspect while having redundancy. For IOPS, Mirrors or RAID0 are the way to go. All hail the magic triangle of storage :slight_smile:

1 Like