A (totally unofficial) Conversation about Relative ZPool Performance

vic · September 3, 2023, 3:56am

I do use ZFS. It’s the plain OpenZFS on top of Arch Linux and I only run an offline pool for data backup.

gogoFC · September 3, 2023, 4:48am

So if you wanted to HELP people understand ZFS speed you would replace your post completely with something true. The only good thing about it is the comments section. It makes people more confused especially if they are new to ZFS.

If people want to understand ZFS pool performance they will read FreeBSD Mastery ZFS by Michael W Lucas and Allan Jude, or something similar. In the book there’s a whole chapter on ZFS performance.

A two disk mirror vdev following your your “assumed” speed will have 200% sequential reads and 100% write speed. The read speed and performance of a ZFS mirror doubles.

When you have a HDD that can do 100 MBps and you put two of them in a mirrored vdev you get 200 MB/s reads because ZFS can read from each disk simultaneously and so the read speed doubles. Write speed is that of a single disk since ZFS has to write the same data to each disk.

Now if you have a pool with 4 vdevs consisting of 2 disk mirrors then your sequential read speed will be 800 MB/s and write speed will be 400 MB/s. The write speed quadruples because ZFS “stripes” the writes across each vdev. The zpool of 4 mirrored vdevs is essentially a stripe of 4 mirrored vdevs.

Now if you have 3 disk mirrors the read speed will be 300 MB/s of each disk and write speed that of a single disk which is 100 MB/s.

So following your “assumptions” where a stripe is at 100% performance, a all mirror zpool of 8 disks has a 400% write performance and 800% read performance.

Mirrors increase performance in ZFS.

NicKF · September 3, 2023, 2:08pm

Ti’s why this is a thread my dude, and tis why I mentioned infinite discalimers.

vic · September 3, 2023, 5:48pm

Practically for a HDDs pool, what are the benefits of a Mirror VDEV having more than two HDDs? I can’t see any.

IMO, it’s throwing away money for diminished gain on disk fault protection. 1/3 of effective storage capacity hurts even thinking about it. At the same time, increased read throughput and IOPS are not beneficial however you look at it.

Using your example a HDD of 100MByte per second read/write throughput. 300MByte per second read from a 3-disk Mirror VDEV is nothing by today’s standard for local access. For over the network access, Ethernet/WiFi is the bottleneck for most home environment.

A typical home (or perhaps most homes) has 1Gbit per second network which tops out ~110MByte per second transfer speed. Also, a typical home won’t have multiple 1Gbit per second read access going into your NAS at the same time either. Wider than 2 disk Mirror VDEV serves no meaningful purpose.

thro · September 3, 2023, 9:53pm

This is me. I use stripes of two drive mirrors and add two drives ar a time for expansion if required prior to my disks aging out.

Is it optimal?
No
Do I get unbalanced vdevs
Yes
Does it really matter?
No
Does it work perfectly fine for home archive?
Yes

Anecdotal reliability? I haven’t had data loss in 13-15 years. I did have to change my pool ashift by breaking the mirrors and creating a new pool to copy to with single drives until I added the pairs (4 drive bays at the time). During that risky time I had the old second drives of each mirror out of the box as backup.

edit:
I’m on my fourth set of drives in this pool; I have had 3 drive failures without loss.

vic · September 4, 2023, 1:54am

That’s a very solid testimonial for Two-Drive Mirror VDEV strategy.

VDEV expansion will be possible very soon. Given this new reality, if you could start over again, would you still pick Two-Drive Mirror VDEVs? Or opt for starting with one 3-Drive raidz1 VDEV, and later adding a fourth drive to the VDEV?

50% effective storage capacity seems a bit low for the 2-Drive Mirror VDEV. That’s my only nitpicking.

thro · September 4, 2023, 3:33am

Mirrors every time for me. Why? Because as mentioned, expansion can be as simple as popping out one half of the mirrors, installing new drives and i have a full backup i could spin the pool up with in the removed drives if something was to go wrong.

I can upgrade 2 drives at a time. The pool will be faster.

Yes, i throw away 50% of the space instead of 33% but space is cheap.

Edit:
To be abundantly clear: this is not enterprise best practice advice. This is what I do in my HOME.

If it was work I’d be running multiple raidz2 or even raidz3 vdevs but my dinky little home NAS is too small for that

thro · September 4, 2023, 11:10am

Also to be clear: in an extreme case of unbalanced VDEVs (or adding a new empty VDEV to the pool) almost all new data will hit that VDEV (ZFS tries to get things back in balance). So performance on said data will be that of the single VDEV. I get around this by upgrading well before I’m out of storage. And I do try to do all drives at the same time.

But I have upgraded only half my VDEVs before (went from 70% full 500gig drives and upgraded half of them and nothing to write home about.

Home? Doesn’t matter. 1 VDEV is faster than gig Ethernet anyway.

If you’re using the storage for VMs or whatever of course things change. But I’m not.

vic · September 4, 2023, 3:30pm

I think unbalanced data among two VDEVs isn’t a concern for home use. For a 4-bay custom NAS, if the user minds, he could choose to start with all 4 bays filled and replace all 4 drive at each expansion cycle every couple of years.

I don’t have a plan to do a new & bigger pool any time soon. Far into the future, if I need one, I probably will prefer a 4-bay raidz1. I will:

start with all 4 bays filled
pick effective capacity >= 3 times the size of current total data size
plan to expand the pool by replacing all four HDDs around every 5-9 years. Depends on one’s rate of data growth. Mine is pretty slow.

I prefer 4-drive raidz1 over two 2-HDD Mirror VDEVs because 75% effective storage size is appealing. I won’t need the better read/write throughput of Mirror VDEVs. I also don’t need the better protection provided by Mirror VDEVs.

Now here I think is the fun part. I look forward to hear from folks in the trade what’s your “proper way” of doing the math. Below is what I got using high-school probability but nevertheless quite interesting results.

Assume a single HDD with an annualised failure rate of 5%. This number is pretty high if you look up Backblaze number. A good HDD model is usually much lower like less than 1%. But let’s assume our cheap HDD is 5% annul failure rate for now.

After some very quick & dirty calculation while on commute, I found that for a 4-bay two 2-HDD Mirror VDEVs setup:

the risk of losing the pool in a year is 0.5% chance.
the risk of losing the pool while resilvering (assume the HDD replaced already dead) is ~10% chance

That for 4-bay raidz1 VDEV setup:

the risk of losing the pool in a year is 1.4% chance
the risk of losing the pool while resilvering (assume the HDD replaced already dead) is ~20% chance.

Given my intended use of the pool for data backup, meaning there will be another copy somewhere or even a third copy, the risks of running a 4-bay raidz1 is more than acceptable to me.

What’s your take, people?

vic · September 4, 2023, 4:51pm

Thinking it over again. The risk of resilvering seems very high and way over-estimated. I think the implicit implication might be that the number assumes the resilver will take 365 days to finish, which is way beyond realistic.

The comparative aspect perhaps is more interesting. Between these two 4-bay setups, raidz1 setup is nearly three times more risky than Mirror VDEV setup to operate. And the resilvering process is likely two times more risky.

It doesn’t change my hypothetical decision. I’ll still go with my preferred choice.

thro · September 5, 2023, 12:29am

When I have had disk failures, ZFS has handled them gracefully -

I’ve seen scrub errors, poor performance, repeated small resilvering for days/weeks (as the disk either temporarily dropped or went temporarily non-responsive, returned garbage, etc.) while/before I bother to get a replacement disk before the drive has finally actually dropped. And as above, I have never lost data.

Again, this is not enterprise or best practice advice (and of course YMMV, you could always maybe get a bad batch of drives - this is totally anecdotal), but what happened for me - as my super important stuff I have synced elsewhere.

Drives can fail in different ways, but due to ZFS having awareness of a filesystem a partial rebuild/repair is possible (to the failing, but not yet fully failed disk) unlike traditional raid which needs to do a full disk sync (which is much more stress to all drives).

If you’re paranoid maybe keep a hot (or even cold) spare to buy you some more time before needing to buy a disk replacement.

Now… if you don’t have any other backup of your important stuff - you’d maybe want to be more paranoid.

BUT… if that’s the case, you’re doing it wrong as your house may burn down or get robbed and you’ll lose everything just not to hardware failure.

vic · September 5, 2023, 2:16am

thro, your points & experience of ZFS are and will be noted.

If anything further that can take away from this thread for home users, I would say for a 4-Bay ZFS NAS, there are two strategies:

two 2-HDD Mirror VDEVs; or
a 4-HDD raidz1 VDEV

Need better read/write throughput, and excellent data protection, go with #1 at the cost of 50% effective capacity. Otherwise, #2 is reasonably good and adequate with an advantage of 75% effective capacity.

gogoFC · October 25, 2023, 7:44am

I guess it has no meaning to you. It actually donesn’t have a meaning to me either. I was just arguing mathematics and you’re arguing economics, so I was hypothetical and you were practical, we were comparing apples and oranges.

Some people still find use for that which I have seen. Setting up a 4 disk root on ZFS because you want to ship it off to a Co-Location Data Center and when a Disk fails someone has to drive there and change the disk, you can’t just hire an employee of the Data Center to do it for you unless you pay them a lot of money.

I wouldn’t do it for myself either. Myself I use HDD’s. I just have one SSD for the OS. Spending thousand dollars on 4 Samsung SSD’s with 4TB each is not something I would want to do at all. It’s better to buy two tiny Intel Optane SSD’s for SLOG if you need NFS or Synchronous writes. But other people probably do different things.

A (*totally unofficial*) Conversation about Relative ZPool Performance

A (totally unofficial) Conversation about Relative ZPool Performance