TrueNAS Scale ZFS Bottleneck - The Quest For 1000 MB/sec Reads

I’m trying to figure out what could be causing the bottleneck on my NAS. By my calculations I should be able to easily hit over 1000 MB/sec sustained read but I don’t know why I’m not. Any useful troubleshooting advice would be appreciated.

System:

OS: TrueNAS-SCALE-22.12.2

CPU: Threadripper 1950X

RAM: 128 GB DDR4

Network: 10 Gbit (Intel X710-DA2 → Mikrotik CRS317-1G-16S+RM → AQUANTIA AQC107)

Controller: Adaptec PMC ASR-72405

Drives: 8 Seagate Exos X20

ZFS Pool Config: 2 VDevs (4 drives each) in RAIDZ1

Benchmark:

You should. Did you test right at the source/storage server? Measuring stuff and the end of the pipeline can have multiple reasons.

If the results are similar on TrueNAS locally, we can rule out Windows, Network and other stuff.

FIO Sequential Write Test Command:

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --filename=test --bs=4M --size=4G --readwrite=write --ramp_time=4

Write Results:

test: (g=0): rw=write, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=1
fio-3.25
Starting 1 process

test: (groupid=0, jobs=1): err= 0: pid=3788029: Sat Nov 25 07:10:59 2023
  write: IOPS=571, BW=2286MiB/s (2397MB/s)(4096MiB/1792msec); 0 zone resets
  cpu          : usr=7.59%, sys=58.35%, ctx=4628, majf=0, minf=8
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,1024,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=2286MiB/s (2397MB/s), 2286MiB/s-2286MiB/s (2397MB/s-2397MB/s), io=4096MiB (4295MB), run=1792-1792msec

FIO Sequential Read Test Command:

sync;fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test --filename=test --bs=4M --size=4G --readwrite=read --ramp_time=4

Read Results:

test: (g=0): rw=read, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB, (T) 4096KiB-4096KiB, ioengine=libaio, iodepth=1
fio-3.25
Starting 1 process

test: (groupid=0, jobs=1): err= 0: pid=3785278: Sat Nov 25 07:10:29 2023
  read: IOPS=1145, BW=4582MiB/s (4804MB/s)(4096MiB/894msec)
  cpu          : usr=0.78%, sys=96.30%, ctx=102, majf=0, minf=521
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=1024,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=4582MiB/s (4804MB/s), 4582MiB/s-4582MiB/s (4804MB/s-4804MB/s), io=4096MiB (4295MB), run=894-894msec

looks good. Some ZFS magic at work, but that applies to file transfers too. So that’s fine.

Yeah, cache doing it’s work.

So your pool is fine and running well. Always a good thing.

So it’s a problem is somewhere grounded in network, type of filesharing (iSCSI,SMB,NFS) and client configuration (Windows thing).

First post looks like what I would expect on a low blocksize iSCSI share (especially with TrueNAS Scale) or metadata trouble by issuing too many op/s to SMB.

The hard drives you have are capable of 250MB/sec.
I assume you have them in a pair of 4 drive raidz1 so when uncached reading or writing to a single vdev you would get 4-1(for raidz1 parity)*250MBps=750MBps, which is what you got.

Those numbers add up for a single vdev but shouldn’t it be reading and writing using both vdevs to give me a potential of 1500MBps? What would be limiting it to a single vdev? It’s a brand new pool with all of the same drives.

I ran an iperf3 test and noticed that my network connection seemed to be maxing out at 6 Gb/sec (750 MB/sec). So I changed NICs on the client machine to use the onboard 10 Gbit NIC (Marvell AQtion) instead of the Cisco/Intel X710-DA. It seems like it was a NIC issue with the X710-DA because now I’m getting way faster speeds. So at this point I’m wondering if it’s a driver issue with the X710-DA, the SFP+ module, or the NIC itself. But at least I know now what was causing the bottleneck.

Before (Cisco/Intel X710-DA):

[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   297 MBytes   249 Mbits/sec                  sender
[  4]   0.00-10.00  sec   297 MBytes   249 Mbits/sec                  receiver
[  6]   0.00-10.00  sec   305 MBytes   256 Mbits/sec                  sender
[  6]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  receiver
[  8]   0.00-10.00  sec   455 MBytes   382 Mbits/sec                  sender
[  8]   0.00-10.00  sec   455 MBytes   382 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec   305 MBytes   256 Mbits/sec                  sender
[ 10]   0.00-10.00  sec   305 MBytes   256 Mbits/sec                  receiver
[ 12]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  sender
[ 12]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  receiver
[ 14]   0.00-10.00  sec   442 MBytes   371 Mbits/sec                  sender
[ 14]   0.00-10.00  sec   442 MBytes   371 Mbits/sec                  receiver
[ 16]   0.00-10.00  sec   457 MBytes   383 Mbits/sec                  sender
[ 16]   0.00-10.00  sec   457 MBytes   383 Mbits/sec                  receiver
[ 18]   0.00-10.00  sec   296 MBytes   248 Mbits/sec                  sender
[ 18]   0.00-10.00  sec   296 MBytes   248 Mbits/sec                  receiver
[ 20]   0.00-10.00  sec   456 MBytes   382 Mbits/sec                  sender
[ 20]   0.00-10.00  sec   456 MBytes   382 Mbits/sec                  receiver
[ 22]   0.00-10.00  sec   454 MBytes   381 Mbits/sec                  sender
[ 22]   0.00-10.00  sec   454 MBytes   381 Mbits/sec                  receiver
[ 24]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  sender
[ 24]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  receiver
[ 26]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  sender
[ 26]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  receiver
[ 28]   0.00-10.00  sec   305 MBytes   256 Mbits/sec                  sender
[ 28]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  receiver
[ 30]   0.00-10.00  sec   455 MBytes   382 Mbits/sec                  sender
[ 30]   0.00-10.00  sec   455 MBytes   381 Mbits/sec                  receiver
[ 32]   0.00-10.00  sec   440 MBytes   369 Mbits/sec                  sender
[ 32]   0.00-10.00  sec   440 MBytes   369 Mbits/sec                  receiver
[ 34]   0.00-10.00  sec   455 MBytes   382 Mbits/sec                  sender
[ 34]   0.00-10.00  sec   455 MBytes   382 Mbits/sec                  receiver
[ 36]   0.00-10.00  sec   295 MBytes   248 Mbits/sec                  sender
[ 36]   0.00-10.00  sec   295 MBytes   248 Mbits/sec                  receiver
[ 38]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  sender
[ 38]   0.00-10.00  sec   303 MBytes   254 Mbits/sec                  receiver
[ 40]   0.00-10.00  sec   303 MBytes   254 Mbits/sec                  sender
[ 40]   0.00-10.00  sec   303 MBytes   254 Mbits/sec                  receiver
[ 42]   0.00-10.00  sec   304 MBytes   255 Mbits/sec                  sender
[ 42]   0.00-10.00  sec   303 MBytes   254 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  7.07 GBytes  6.07 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  7.07 GBytes  6.07 Gbits/sec                  receiver

After NIC Switch (Marvell AQtion):

[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   570 MBytes   478 Mbits/sec                  sender
[  4]   0.00-10.00  sec   570 MBytes   478 Mbits/sec                  receiver
[  6]   0.00-10.00  sec   574 MBytes   481 Mbits/sec                  sender
[  6]   0.00-10.00  sec   574 MBytes   481 Mbits/sec                  receiver
[  8]   0.00-10.00  sec   568 MBytes   476 Mbits/sec                  sender
[  8]   0.00-10.00  sec   568 MBytes   476 Mbits/sec                  receiver
[ 10]   0.00-10.00  sec   571 MBytes   479 Mbits/sec                  sender
[ 10]   0.00-10.00  sec   571 MBytes   479 Mbits/sec                  receiver
[ 12]   0.00-10.00  sec   556 MBytes   466 Mbits/sec                  sender
[ 12]   0.00-10.00  sec   556 MBytes   466 Mbits/sec                  receiver
[ 14]   0.00-10.00  sec   572 MBytes   479 Mbits/sec                  sender
[ 14]   0.00-10.00  sec   572 MBytes   479 Mbits/sec                  receiver
[ 16]   0.00-10.00  sec   554 MBytes   465 Mbits/sec                  sender
[ 16]   0.00-10.00  sec   554 MBytes   465 Mbits/sec                  receiver
[ 18]   0.00-10.00  sec   570 MBytes   478 Mbits/sec                  sender
[ 18]   0.00-10.00  sec   570 MBytes   478 Mbits/sec                  receiver
[ 20]   0.00-10.00  sec   569 MBytes   478 Mbits/sec                  sender
[ 20]   0.00-10.00  sec   569 MBytes   478 Mbits/sec                  receiver
[ 22]   0.00-10.00  sec   566 MBytes   475 Mbits/sec                  sender
[ 22]   0.00-10.00  sec   566 MBytes   475 Mbits/sec                  receiver
[ 24]   0.00-10.00  sec   570 MBytes   478 Mbits/sec                  sender
[ 24]   0.00-10.00  sec   570 MBytes   478 Mbits/sec                  receiver
[ 26]   0.00-10.00  sec   565 MBytes   474 Mbits/sec                  sender
[ 26]   0.00-10.00  sec   565 MBytes   474 Mbits/sec                  receiver
[ 28]   0.00-10.00  sec   568 MBytes   477 Mbits/sec                  sender
[ 28]   0.00-10.00  sec   568 MBytes   477 Mbits/sec                  receiver
[ 30]   0.00-10.00  sec   552 MBytes   463 Mbits/sec                  sender
[ 30]   0.00-10.00  sec   552 MBytes   463 Mbits/sec                  receiver
[ 32]   0.00-10.00  sec   568 MBytes   477 Mbits/sec                  sender
[ 32]   0.00-10.00  sec   568 MBytes   477 Mbits/sec                  receiver
[ 34]   0.00-10.00  sec   550 MBytes   461 Mbits/sec                  sender
[ 34]   0.00-10.00  sec   550 MBytes   461 Mbits/sec                  receiver
[ 36]   0.00-10.00  sec   562 MBytes   472 Mbits/sec                  sender
[ 36]   0.00-10.00  sec   562 MBytes   472 Mbits/sec                  receiver
[ 38]   0.00-10.00  sec   566 MBytes   475 Mbits/sec                  sender
[ 38]   0.00-10.00  sec   566 MBytes   475 Mbits/sec                  receiver
[ 40]   0.00-10.00  sec   561 MBytes   470 Mbits/sec                  sender
[ 40]   0.00-10.00  sec   561 MBytes   470 Mbits/sec                  receiver
[ 42]   0.00-10.00  sec   566 MBytes   474 Mbits/sec                  sender
[ 42]   0.00-10.00  sec   566 MBytes   474 Mbits/sec                  receiver
[SUM]   0.00-10.00  sec  11.0 GBytes  9.48 Gbits/sec                  sender
[SUM]   0.00-10.00  sec  11.0 GBytes  9.48 Gbits/sec                  receiver
1 Like

Maybe the bottleneck is in relation to the Adaptec PMC ASR-72405 controller, resp. its limitation of 6 Gb/s per port?
Do you run it in Raid-Mode (caching enabled?) or in HBA-Mode (passthrough IT-Mode)?

And the Exos X20 disks and cabling, are they of the SATA (6 Gb) or the SAS (12 Gb) type?

Problem was the Intel X710 NIC I had it plugged into a PCIe slot that only giving 1x PCIe 3.0 per SFP+ port on the card. I moved the NIC to another PCIe slot and I was successfully able to max out at 10 Gbit.

2 Likes

Welp, that would definitely do it hah. Reading through this I was about to start suggesting network throttling, nice catch!

1 Like