ZFS RaidZ1: Bad performance in fio benchmark with 4MB blocksize

Hi,

I had planned to use a RaidZ1 as storage for Seafile (is an alternative to Nextcloud / Owncloud) and maybe 2-3 other VMs with not really high load

The performance for a Z1 raid also seems very bad to me, when testing the individual disks I had about 50% more performance in the same test

But according to this formula, I should have had a factor of 3 higher performance
Streaming write speed: (N - p) * Streaming write speed of single drive

This is my test script:

IODEPTH=16
NUMJOBS=1
BLOCKSIZE=4M
RUNTIME=900

TEST_DIR=/testZFS/fiotest
mkdir -p $TEST_DIR && rm -r $TEST_DIR/*

fio --name=write_throughput --directory=$TEST_DIR --numjobs=$NUMJOBS \
--size=50G --time_based --runtime=$RUNTIME --ramp_time=2s --ioengine=libaio \
--direct=1 --bs=$BLOCKSIZE --iodepth=$IODEPTH --rw=randwrite \
--group_reporting=1 --iodepth_batch_submit=$IODEPTH \
--iodepth_batch_complete_max=$IODEPTH

I created the ZFS pool and the VM with default settings.
once with ashift 9 and once with 12, but that didn’t make much difference

this is are my test devices, which are plugged into an M.2 PCIe4 x4 slot / M.2 Slot on an PCIe4 x16 expansion card (all SSDs are correct detected with PCIe4 x4 lanes each)

2 x Lexar SSD NM790 1TB
2 x KIOXIA-EXCERIA PLUS G3 SSD 1TB

this are the results with ZFSRaidZ1:

WRITE: bw=944MiB/s (990MB/s), 944MiB/s-944MiB/s (990MB/s-990MB/s), io=830GiB (891GB), run=900009-900009msec


Because of the poor performance, I took another step back and took a closer look at the performance of the individual disks

And there I noticed that the performance with ext4 is significantly better than with ZFS

This is the result of the fio benchmark with the same settings as above.
The disks are formatted here with ext4
nvme2 and nvme3 are the two Kioxia and nvme0 and nvme1 are the Lexar

And here is the result with ZFS.
The first block is the Lexar, the second the Kioxia and the third the Kioxia again, but this time with 1MB / 4MB recordsize

Even when I increased the recordsize from the default 128KB to 1MB or 4MB, I only got about 1200MB/s with the Kioxia SSDs.

Here are the values from the ext4 test
Kioxia: approx. 3400MB/s
Lexar: Short peak at approx. 4500MB/s and then approx. 1400MB/s


Any ideas why the SSDs with ZFS have such significantly lower performance?


Edit:
Screenshots replaced, because I figured out how to display the individual results instead of the sum in netdata

Welcome!
What CPU are you running? You may be reaching the maximum performance for raidz1 with the CPU being the bottleneck.

You could try running a striped mirror to test this, you’ll loose storage efficiency but you should pick up performance.

1 Like

Hi,
thanks for the answer.

I am using a Ryzen 5 5600 (6 cores / 12 threads).
I think this CPU should have enough performance.

But my second question was also why the ZFS performance in the single disk benchmarks is significantly worse than ext4.
I think this is also the reason for the bad performance of RaidZ1

That’s a fairly strong CPU, I would have expected slightly elevated numbers, you’re not running compression are you?

Honestly those single disk performance numbers looked pretty good to me.
ZFS is not a performant filesystem/vol manager; at least version 2 isn’t. Supposedly version 3 is going to have a bunch of optimizations for nvme drives merged in. ZFS still seems to mostly be tuned to run best on spinning disks where these limits are rarely reached.

The Lexar’s performance is okay, as it drops to 1300MB/s after a few minutes with ext4.

But the Kioxia is reaching the 3400MB/s almost continuously and with ZFS it is only about 1000MB/s on average.

Is it really “normal” that ZFS only reaches 30% of the performance?

Take what I say with a grain of salt, I don’t have experience running a setup exactly like yours; but from what I’ve seen in the past and then extrapolating, the single disk performance numbers look very reasonable for ZFS.

I’ve seen the numbers go much lower than 30% on some of the lower clocked CPUs with very fast storage.

Yes, but it is understandable that the performance drops when the CPU performance is not sufficient.
And in my case, it’s clearly not a (significant) bottleneck, nor should it be.

Sheer CPU computational throughput isn’t the only bottleneck ZFS has. ZFS loves to shuffle data around back and forth from different places which compounds storage and IO request latencies onto each other and this becomes very apparent with NVMe.

Brian Atkinson did a talk on completely bypassing ARC and in some of his benchmarks he got a 12 wide PM1725a setup to triple in performance:

Thanks for the hint.

This feature has still not been merged for two years,…
Therefore I built the zfs version from the pull request myself.
Unfortunately, this didn’t really improve the performance for me

Might be a dumb question since you already knew enough to build from the pull request, but: did you get the direct dataset property set to always so that the new feature would be active?

Well, my knowledge is not that great.
I just followed the ZFS documentation :sweat_smile:

However, I had already reported the problem in the corresponding ZFS pull request and received the answer that something was wrong with my ZFS version.

I then tried it again on a debian bookworm instead of proxmox (which is actually also based on debian bookworm in the latest version)

After that the ZFS compilation seems to have worked correctly.
At least the new parameter was then displayed to me directly and I was able to set it.

With the new ZFS version, however, my performance has become even worse…

I have also commented on this again in the pull request, let’s see if a solution can be found.
Perhaps consumer SSDs are not particularly suitable for ZFS