ZFS NFS share significantly more performant that on native pool?!?

So here is a super weird one. I have a TrueNAS SCALE server setup with a zfs pool that has an NFS share that is mounted to my Proxmox Backup Server. When running an fio test from the PBS server, the results are significantly better than when running the same fio test on the same file on the TrueNAS server.

59.4k IOPS and 232MiB/s vs 164 IOPS and 660KiB/s

Anyone have any idea what is going on here?

Both servers have 2x 10gbe in LAG. MTU is 9000 everywhere.

The rust1 pool on the TrueNAS server is configured with the following:

  • Data VDEVs - 5 x MIRROR | 2 wide | Mixed Capacity (2x14TB Segates, 4x4TB WD Reds, 4x2TB WD Red and Greens)
  • Metadata VDEVs - 2 x MIRROR | 2 wide | 110.28 GiB (4x Intel Optane P1600x NVME)
  • Log VDEVs - 1 x DISK | 1 wide | 7.99 GiB (Radian RMS-200 8gb PCIe “NVME”)

PBS:

root@pbs:/mnt/backups# fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [F(1)][100.0%][eta 00m:00s]                          
random-write: (groupid=0, jobs=1): err= 0: pid=10796: Sun May  5 01:53:22 2024
  write: IOPS=59.4k, BW=232MiB/s (243MB/s)(14.6GiB/64411msec); 0 zone resets
    slat (nsec): min=729, max=4024.8k, avg=1330.19, stdev=2207.14
    clat (nsec): min=271, max=1775.5M, avg=12354.60, stdev=1376207.66
     lat (usec): min=6, max=1775.5k, avg=13.68, stdev=1376.21
    clat percentiles (usec):
     |  1.00th=[    8],  5.00th=[    8], 10.00th=[    8], 20.00th=[    9],
     | 30.00th=[    9], 40.00th=[    9], 50.00th=[    9], 60.00th=[   10],
     | 70.00th=[   10], 80.00th=[   12], 90.00th=[   13], 95.00th=[   15],
     | 99.00th=[   24], 99.50th=[   30], 99.90th=[   46], 99.95th=[   55],
     | 99.99th=[  113]
   bw (  KiB/s): min=    8, max=382928, per=100.00%, avg=283251.44, stdev=105311.02, samples=107
   iops        : min=    2, max=95734, avg=70812.92, stdev=26327.78, samples=107
  lat (nsec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (usec)   : 2=0.01%, 4=0.01%, 10=70.56%, 20=27.79%, 50=1.57%
  lat (usec)   : 100=0.05%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2000=0.01%
  cpu          : usr=16.82%, sys=21.01%, ctx=3840883, majf=0, minf=24
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,3826246,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=232MiB/s (243MB/s), 232MiB/s-232MiB/s (243MB/s-243MB/s), io=14.6GiB (15.7GB), run=64411-64411msec

TrueNAS:

root@thanos[/mnt/rust1/share/backups]# fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=944KiB/s][w=236 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=3050516: Sun May  5 02:12:11 2024
  write: IOPS=164, BW=660KiB/s (676kB/s)(38.7MiB/60009msec); 0 zone resets
    slat (nsec): min=832, max=164944, avg=4833.47, stdev=2263.04
    clat (usec): min=18, max=96567, avg=6053.59, stdev=7739.00
     lat (usec): min=22, max=96571, avg=6058.43, stdev=7739.04
    clat percentiles (usec):
     |  1.00th=[   34],  5.00th=[   45], 10.00th=[  106], 20.00th=[  392],
     | 30.00th=[  449], 40.00th=[  506], 50.00th=[ 2966], 60.00th=[ 5735],
     | 70.00th=[ 8455], 80.00th=[11469], 90.00th=[16319], 95.00th=[20055],
     | 99.00th=[34341], 99.50th=[39060], 99.90th=[54789], 99.95th=[65799],
     | 99.99th=[96994]
   bw (  KiB/s): min=  344, max= 1088, per=99.73%, avg=658.89, stdev=126.42, samples=119
   iops        : min=   86, max=  272, avg=164.72, stdev=31.60, samples=119
  lat (usec)   : 20=0.01%, 50=7.36%, 100=2.60%, 250=3.85%, 500=25.56%
  lat (usec)   : 750=6.05%, 1000=0.18%
  lat (msec)   : 2=1.27%, 4=6.67%, 10=22.05%, 20=19.35%, 50=4.90%
  lat (msec)   : 100=0.15%
  cpu          : usr=0.23%, sys=0.16%, ctx=9902, majf=0, minf=22
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,9898,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=660KiB/s (676kB/s), 660KiB/s-660KiB/s (676kB/s-676kB/s), io=38.7MiB (40.5MB), run=60009-60009msec
zfs get all rust1/share/backup
root@thanos[/mnt/rust1/share/backups]# fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.33
Starting 1 process
Jobs: 1 (f=1): [w(1)][100.0%][w=944KiB/s][w=236 IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=3050516: Sun May  5 02:12:11 2024
  write: IOPS=164, BW=660KiB/s (676kB/s)(38.7MiB/60009msec); 0 zone resets
    slat (nsec): min=832, max=164944, avg=4833.47, stdev=2263.04
    clat (usec): min=18, max=96567, avg=6053.59, stdev=7739.00
     lat (usec): min=22, max=96571, avg=6058.43, stdev=7739.04
    clat percentiles (usec):
     |  1.00th=[   34],  5.00th=[   45], 10.00th=[  106], 20.00th=[  392],
     | 30.00th=[  449], 40.00th=[  506], 50.00th=[ 2966], 60.00th=[ 5735],
     | 70.00th=[ 8455], 80.00th=[11469], 90.00th=[16319], 95.00th=[20055],
     | 99.00th=[34341], 99.50th=[39060], 99.90th=[54789], 99.95th=[65799],
     | 99.99th=[96994]
   bw (  KiB/s): min=  344, max= 1088, per=99.73%, avg=658.89, stdev=126.42, samples=119
   iops        : min=   86, max=  272, avg=164.72, stdev=31.60, samples=119
  lat (usec)   : 20=0.01%, 50=7.36%, 100=2.60%, 250=3.85%, 500=25.56%
  lat (usec)   : 750=6.05%, 1000=0.18%
  lat (msec)   : 2=1.27%, 4=6.67%, 10=22.05%, 20=19.35%, 50=4.90%
  lat (msec)   : 100=0.15%
  cpu          : usr=0.23%, sys=0.16%, ctx=9902, majf=0, minf=22
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,9898,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=660KiB/s (676kB/s), 660KiB/s-660KiB/s (676kB/s-676kB/s), io=38.7MiB (40.5MB), run=60009-60009msec

It could be because NFS is using synchronous writes so it’s benefiting from SLOG, the test may be asynchronous when run locally. Or possibly RAM caching on the remote machine.

Remote machine being the PBS server?

Remote machine being the PBS server?

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.