I’m new to the ZFS party, and I’m not really sure how to configure the thing to get full performance.
I’ve got a low power box(AMD GX-424CC) with a raid controller attached to it and a bunch of disks. There’s 8GB of RAM inside. I didn’t configure any cache, just the spinning rust.
I have a raid 10 pool configured with 4 HDDs, and I can’t go past 100MB/s write speed. Usually it’s more like 60MB/s. I’m copying from 3 HDDs put in a raid 0 on the controller itself. How do I check what’s the bottleneck here?
if you wanted to test for a particularly bad drive, you could run
zpool iostat -yv 0.5
(ctrl+c to close)
and it’ll give the report of how much each drive is reading/writing?
might be one more or less busy.
As Exard mentioned, just a read, or write at a time, if you want to check performance?
Bearing in mind that ZFS stores data in sectors. the system builds chunks in memory, then flushes it out to disk when the chunk reaches a certain size, so you should see waves of more- and less- data being transferred.
It’s the peaks you might want to check?
running with 3 second intervals should smooth it out.
There is also compression that might be used to speed up (or slow down) a transfer.
fio is basically useless as a metric for drive performance if caching or compression are influencing the results. I set up a special fio dataset for my pools for testing purposes, where I disabled all interfering zfs magic. Even things like atime,sync or special_small_blocks may result in fio gibberish. ZFS is great, but is a bitch when it comes to setting up a benchmark.
The use of “RAID-10” is confusing. Are you running hardware RAID with a ZFS pool on top? ZFS really just wants the disks passed through JBOD and you surrender the advantages of the filesystem if you use a hardware RAID solution. Please include the output of “zpool status -v” when you get a chance.
So if you are testing 4K writes with FIO, you’ll generally want a 4K recordsize on the dataset.
This is more important for read tests, but you also want to have the test to read/write out double the amount of memory that ZFS is using for ARC. Otherwise your read tests speeds can end up just being a test your RAM speed. Creating a dataset with compression set to off or lze (so long as you don’t write zeros, as lze compresses those) is another possible solution. Don’t ever set compression off otherwise though, it’s meant only for testing and debugging, at the very least use lze on incomprehensible datasets, so at least the slack space in the last (usually) partially filled block of the file is compressed.
@zviratko You are seeing high write amplification when doing random writes, as well as slower performance than expected. The write amplification is likely due to indirect blocks. E.g. with recordsize=4k, every random 4k write will need to write 4KB of user data, plus at least 2x 128K (before compression) level-1 indirect blocks. Depending on your file size, level-2 indirect block writes may not “amortize out” over the txg either. So 32x inflation would be expected. In addition to this, if the indirect blocks are not all cached, then you will need to read the indirect block before making the modification, which has a very adverse effect on write latency.
…I’m not gonna pretend to know what that means.
Also make sure atime=off on the dataset (or even better the whole pool). This stops the completely unnecessary changes to metadata every time anything even looks at a file.
Matt Ahrens, the pope of ZFS.
Whenever I tune into one of the more in-depth lectures on the OpenZFS YT channel, I usually feel the same.
But regarding the OP…
It’s difficult to help pinpoint the source of your problem. Too many variables like RAID controller, some other ominous RAID 0 array that is on the same server but not ZFS(?) and connected via Samba share (?) in an unknown VM configuration and unknown network speed.
Might as well be because the VM is limited to 1GBit/s bandwidth due to networking. That would also explain the ~100Mib/s max write speeds. Or your embedded AMD processor can’t keep up with gzip-9 compression on your dataset. Or your hardware RAID controller is messing up everything. Or you are expecting too much from async random writes (and possible write amplification, as mentioned by @Log).
Sometimes it’s as simple as the problem being the program that measures the speeds. If I just check my KDE file transfer window (not sure if thats a KDE Plasma, Dolphin or NFS problem/error) on my laptop, it always tells me about 50-70MiB/s transfer speed when copying a big file.
But I know for a fact that this file is copied at ~90MiB/s (which is what you can expect from an old 1GBit/s NIC writing to an NFS share on the network), backed up by htop, zpool iostat and my good old physical stop watch + basic maths.
If you want to test things, you don’t start out complicated.
You eliminate points of failures, one at a time. If fio doesn’t show anything wrong (tbd), I’d continue with eliminating possible problems/bottlenecks with the VM, starting with things like iperf, then testing Samba performance and VM disks after that.
Thanks for the tips, I’ve investigated a little bit more. Iperf gives me ~945Mbps, so that is ok. I’ve disabled the compression on my dataset, and CPU does not seem to be an issue. With fio and sequential writes directly on the host I get now around 180 MB/s, I guess that’s all I’ll get from old HDDs, unless I add a ton of drives.
That leaves me with the VM or samba config being an issue. After enabling asynchronous writes and receivefile size, I got up to 60MB/s on host system. Is hosting the share inside a VM even a good idea? I wanted it to be nicely contained inside a disk image, but maybe it’s just too much trouble for what it’s worth.
I have my recordsize on the default, 128k. Is this appropriate for VM storage pool? Also, is splitting the pool to many datasets and changing the record size a thing?
VM Disks are stored as a single file on the host system (e.g. qcow2 or raw) and presented as virtual, but fully functioning devices. This usually results in sub-optimal performance and is the worst, but usually easiest and only option. My VM disks are mostly boot drives and the VMs in need of storage get an iSCSI or NFS share from my NAS.
Hypervisors also let you add a full drive to a VM which bypasses the troublesome VM disk file situation. But that drive isn’t available to the host anymore.
The tried and trusted option however is to passthrough the entire controller to the VM. This usually is your RAID Controller/HBA or on-board SATA controller. The guest VM gets full control with minimal virtualization overhead because the storage isn’t virtual disks or virtual SCSI controllers, but actual real hardware.
I’m personally running a TrueNAS VM, with no performance problems whatsoever (had it running on bare metal before, didn’t notice any major difference). Shares and storage in a Fileserver/NAS-VM is totally viable, but performance was atrocious until I passed my SSDs and SATA controllers to the VM. NIC passthrough helped my 10GBit network throughput quite a bit too.