Subpar performance on my TrueNAS pool

Power_Max · August 28, 2022, 6:55pm

Currently, I have 8, 4TB CMR disks from 2014 configured as a stripped mirrors (4 VDEVs) for a total of 16TB or 14.5TiB. The drives are rated 180MB/s so I would expect read performance of 1440 MBps and write performance of 720MB/s. My network is capable of moving beyond 8Gb/s (10G fiber) and this was validated with iperf. In practice I’m not getting anything near this!

PRIMARY_POOL	0	0	0	ONLINE
MIRROR	        0	0	0	ONLINE
da7p2	        0	0	0	ONLINE
da3p2	        0	0	0	ONLINE
MIRROR	        0	0	0	ONLINE
da6p2	        0	0	0	ONLINE
da1p2	        0	0	0	ONLINE
MIRROR	        0	0	0	ONLINE
da5p2	        0	0	0	ONLINE
da2p2	        0	0	0	ONLINE
MIRROR	        0	0	0	ONLINE
da0p2           0	0	0	ONLINE
da4p2           0	0	0	ONLINE

At the start of a transfer (probably due to it getting cached in ARC w/ repeated transfer of the same file) but the best I can do is around 250MB/s read and write sustained. (it varies wildly, usually starting off around 480MB/s to 680MB/s I think because of the 32GB ARC Cache, then dropping to 250MB/s to 300MB/s after a while) The numbers are taken simply from observing gnome system montior while using DD to read and write big files from and to /dev/zero with bs usually set to between 16M to 128M (playing with this doesn’t seem to make a big difference)

I’m using NFS (I can’t get CIFS/SMB to work in linux, which is a whole separate issue…) and have used the following options: -o rsize=65535,wsize=65535,async

I resolved some minor networking issues I was having with my hardware (flaky SFP+ to LC fiber block), configured 9000 MTU size through the whole network path, replaced a (damaged?) Brocade 1020 1/10G nic that was only getting 2 PCIe lanes on my desktop for some odd reason, rebooted my server to fix a weird issue where transfer speeds one way were capped at 4.7Gbps while the other way I got 9Gbps, and finally verified 8Gbps+ results in short iperf tests from my desktop to my TrueNAS VM after a reboot of proxmox server.

I found switching to Core 13 from Scale did have a noticeable bump in performance, Scale was even slower! This is unfortunate since I like the UI of Scale and the virtualization/containerization features of scale. I would probably have used Scale instead of ProxMox had it existed 3 years ago when I made this server.

I don’t know what else to try. I want to saturate my 10G network so I can justify its existence! What can I do to further debug my setup?

Bare metal hardware:

Ryzen 5700G
Asus X570-E motherboard
64GB 3200MT/s CL22 ECC DDR4 UDIMM dual channel
1/10G Brocade 1020 and USR 100m fiber SFP+ adapter
2.5GbaseT Realtek PHY
1GbaseT Intel PHY
LSI2408 HBA in IT mode (ROM disabled in UEFI)
Proxmox hypervisor OS booting from 2 PHY CS900 boot SSDs in ZFS mirror

Allocated to TrueNAS:

32GiB memory
8 core CPU, 1 socket, all relevant extensions enabled (i think?)
VirtIO networking
LSI2308 PCIe passed through

Perhaps my HBA isn’t getting the PCIe bandwidth it deserves? Well I checked with lspci -vv and it appears to be fine. It is getting all 8 lanes whist installed in the top slot. LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

Previous posts about this setup:

FunnyPossum · August 29, 2022, 4:20am

I too have poor NFS performance. But my setup of 3 x stripes with very old 4TB SAS disks gets me pegged at 1GB/s writes over SMB for a long time, then it drops down to around 600MB/s write. Your setup should be even faster

thunderysteak · August 29, 2022, 2:20pm

A dumb thing to try, but can you disable sync on the vdev and try NFS performance again?

Power_Max · August 30, 2022, 1:42am

writing zeros, it averaged pretty much the whole time around 480MB/s. A bit faster but still slower than it should be. For reading, nothing changed. It starts out peaking slightly above 570MB/s and then it is a rollercoaster averaging still around 300MB/s or so.

Power_Max · August 30, 2022, 2:31am

I setup an SMB share to determine if this is the fault of NFS, or of my pool. Performed the same transfer and saw nearly 1GiB/s transfer reads for the first bit and then it fell back to underwhelming 200MB/s. So… both? Clearly the first bit is saturating what I SMB over my fiber can do while the data is cached in ARC. The second bit is throttled by the pool… presumably.

So how can I further optimise NFS (as I understand it is more secure, more “linux first,” and is supposed to perform better)

Also why is my pool so slow???

diizzy · September 27, 2022, 9:36pm

There are a few utils you can use, does gstat show 100% utilization on any disk during these operations?