How to improve NFS/ZFS performance (using SLOG, L2ARC)

simonmicro · September 27, 2021, 10:37am

Hi,

this is gonna be a long one. So today’s topic is my server, which runs a broad spectrum of different services - starting with GitLab (CI), Jellyfin, web sites and also many game servers (all this just an excerpt). The specs are (about):

AMD 7601
Supermicro board
128GB RAM
ZFS “raid_01”, 3x2TB WD Red HDD (RAID5)
ZFS “raid_02”, 3x1TB WD Red SSD (RAID5)
Everything runs Debian 10

This all is done using the host only as ZFS system & hypervisor and than running everything inside their own libvirtd-vms. All vms can access the ZFS data sets using NFS (over an internal virtio-network). I’ve got different vms for different purposes - two for different isolated docker environments (trusted and public), two gameserver vms (one with NFS storage, one which stores everything on its disk) and some others (not relevant today (?)).

The vms root disks are all located on the raid_02 pool (inside a shared data set), the raid_01 is my general data pool (there are also the game server files located).

All pool have lz4 compression enabled and are tuned to pass trough any TRIM events from the vms (yes, there are two minor data disks also located on raid_01 - not relevant).

Whats the problem?
I use Pterodactyl for hosting my game servers - it also provides a handy service to create backups for the servers. I have currently about 8 active Minecraft servers on the vm called “Gameserver-1”. Until recently they all ran their backups (which basically copies the whole server) simultaneously starting every full hour. As one can imagine this caused all the gameservers to lag as hell and freeze for several seconds - often timing out the players on them. I’ve then changed the backup schedules to more spread out over the hour and this problem is now “gone” - but still the performance of the NFS attached data sets are still not impressive. I have only simple copy-paste benchmarks for now, but peak write performance feels slow (my pc 1GB/s → ssh → vm → nfs → root server → zfs: ~50MB/s) while the direct performance is way better (my pc 1GB/s → ssh → root server → zfs: ~120MB/s).

What I tried…
I assume something is going wrong on the ZFS side, as NFS is managed by it. All NFS shares are mounted using “defaults,_netdev,x-systemd.automount,x-systemd.requires=network-online.target” as options. Adding “async” does not affect the performance.

All vms are connected to the server using an virtual network and virtio devices - so the only limit should be the cpu?!

Help…
So here my questions:

Do you see general problems, which need to be addressed?
Any idea how I could improve the NFS performance inside my vms?
I thought about reducing the raid_02 size and adding SLOG / L2ARC on them - any thoughts regarding this?
I use Netdata for all monitoring, so In case you have some idea on which graphs I should look at…

Anyway, thanks for reading, every response and also for your ideas!

Log · September 27, 2021, 3:04pm

L2ARC on benefits special workloads, which most consumers simply don’t have. Unless you are willing to dig into analyzing your ARC hit/miss ratio, ignore it. The real solution to problems solved by L2ARC is generally to max out ram first.

NFS by default makes sync writes, which is where the awful performance comes from when writing to it. If you temporarily turn off sync writes and lie about it then you can determine if having an SLOG would actually benefit you in real conditions. Do note that even using the fastest possible storage will not make sync writes as fast as non-sync writes, there’s just an unavoidable and rather large performance loss for data safety. There are still lots of cheap intel optane 16gb sticks on ebay which work great for basic needs. Sync writes care about latency, not throughout (which is awful on those sticks), so it’s the opposite of what consumer nvme drives for storage can provide.

I strongly recommend you do NOT add either L2ARC or SLOG to your consumer drives.

hddherman · September 28, 2021, 2:09pm

I’m in a similar situation where everything runs “fine”, but if anything on the server takes up too much IO (such as a ZFS backup using syncoid), everything else suffers. SLOG, L2ARC, special metadata device, I bolted everything on it to see if that changes the situation.

What ended up working for me was to identify which workloads are sensitive to latency (database instance, iSCSI target) and set them up on an SSD-based pool instead. I know that I will need good and reliable performance for these workloads. ARC is great, but not for every use case.

Whatever you do with your pool, I recommend measuring the results, because otherwise how can you be sure that the change had an effect at all?

simonmicro · October 4, 2021, 12:02pm

@Log Thanks for your insights, I guess no SLOG/L2ARC for me then (well, according to Netdata I’ve got consistently very high cache hit rates). But I’ll try the sync=disabled for the game server pool and other high-throughput datasets. Mind, that I have an UPS connected to the server, so data loss should not be a real problem.

@hddherman

What ended up working for me was to identify which workloads are sensitive to latency (database instance, iSCSI target) and set them up on an SSD-based pool instead.

As my SSDs still have some space, I’ll try that if ↑ does not help. Also thanks!

Whatever you do with your pool, I recommend measuring the results, because otherwise how can you be sure that the change had an effect at all?

Of course! I’ll keep an eye on my graphs the next days!

Thanks for all your input - I now have some things to test!

system · July 5, 2022, 6:03am

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.