Here's my best case with Linux client to KSMBD leveraging RDMA:
CIFS mount options:
vers=3.1.1,rdma
READ
Server CPU at ~4.8 to 5% over the 2 min run for Reads
Switch reported 69379 Mbits/sec, 2094465 packets/sec, 69% of line rate
raw data
fio --name READ --filename=/mnt/smb/temp/temp.file --rw=read --size=100g --bs=1024k --ioengine=libaio --iodepth=256 --direct=1 --runtime=120 --time_based --group_reporting --numjobs=64
READ: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=256
...
fio-3.28
Starting 64 processes
Jobs: 30 (f=0): [E(1),f(2),E(1),f(1),E(1),_(2),f(2),E(1),_(1),f(1),_(1),f(1),E(1),f(1),_(1),f(2),E(1),f(1),_(1),f(1),_(1),f(1),E(1),_(2),f(1),_(1),f(1),_(1),f(2),_(1),f(1),_(1),f(4),_(2),E(2),f(3),_(1),f(2),_(1),f(1),_(2),E(2),_(1),E(1),_(1),f(1),_(1),f(1)][100.0%][r=23.0GiB/s][r=23.5k IOPS][eta 00m:00s]
READ: (groupid=0, jobs=64): err= 0: pid=9810: Mon Dec 6 19:55:27 2021
read: IOPS=8063, BW=8063MiB/s (8455MB/s)(945GiB/120011msec)
slat (usec): min=28, max=283264, avg=7935.08, stdev=15810.73
clat (usec): min=698, max=4200.3k, avg=2004114.61, stdev=500509.44
lat (msec): min=4, max=4206, avg=2012.05, stdev=501.60
clat percentiles (msec):
| 1.00th=[ 751], 5.00th=[ 1167], 10.00th=[ 1368], 20.00th=[ 1603],
| 30.00th=[ 1754], 40.00th=[ 1888], 50.00th=[ 2022], 60.00th=[ 2140],
| 70.00th=[ 2265], 80.00th=[ 2433], 90.00th=[ 2635], 95.00th=[ 2802],
| 99.00th=[ 3138], 99.50th=[ 3239], 99.90th=[ 3507], 99.95th=[ 3641],
| 99.99th=[ 3977]
bw ( MiB/s): min= 1914, max=23362, per=100.00%, avg=8064.85, stdev=53.53, samples=15038
iops : min= 1914, max=23362, avg=8064.72, stdev=53.53, samples=15038
lat (usec) : 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.02%
lat (msec) : 100=0.04%, 250=0.13%, 500=0.22%, 750=0.56%, 1000=1.64%
lat (msec) : 2000=46.27%, >=2000=51.09%
cpu : usr=0.02%, sys=30.02%, ctx=48693931, majf=0, minf=124845722
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.2%, >=64=99.6%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=967664,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
READ: bw=8063MiB/s (8455MB/s), 8063MiB/s-8063MiB/s (8455MB/s-8455MB/s), io=945GiB (1015GB), run=120011-120011msec
WRITE
Server CPU at ~60 to 70% over the 2 min run for writes (mostly zfs_wr_iss)
Switch reported 40032 Mbits/sec, 1208070 packets/sec, 40% of line rate
– this is about the theoretical limit for my pool’s spinning disks
raw data
fio --name WRITE --filename=/mnt/smb/temp/temp.file --rw=write --size=100g --bs=1024k --ioengine=libaio --iodepth=256 --direct=1 --runtime=120 --time_based --group_reporting --numjobs=64
WRITE: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=256
...
fio-3.28
Starting 64 processes
Jobs: 64 (f=64): [W(64)][100.0%][w=4376MiB/s][w=4375 IOPS][eta 00m:00s]
WRITE: (groupid=0, jobs=64): err= 0: pid=10383: Mon Dec 6 20:02:45 2021
write: IOPS=4744, BW=4744MiB/s (4975MB/s)(557GiB/120132msec); 0 zone resets
slat (usec): min=43, max=692942, avg=13413.56, stdev=28613.19
clat (usec): min=476, max=6221.5k, avg=3382036.28, stdev=809330.73
lat (usec): min=940, max=6221.8k, avg=3395450.12, stdev=811011.94
clat percentiles (msec):
| 1.00th=[ 852], 5.00th=[ 2022], 10.00th=[ 2433], 20.00th=[ 2802],
| 30.00th=[ 3004], 40.00th=[ 3239], 50.00th=[ 3406], 60.00th=[ 3608],
| 70.00th=[ 3809], 80.00th=[ 4044], 90.00th=[ 4329], 95.00th=[ 4597],
| 99.00th=[ 5134], 99.50th=[ 5269], 99.90th=[ 5604], 99.95th=[ 5738],
| 99.99th=[ 5940]
bw ( MiB/s): min= 642, max=14378, per=99.89%, avg=4738.86, stdev=35.92, samples=14892
iops : min= 642, max=14377, avg=4737.93, stdev=35.92, samples=14892
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.02%, 20=0.02%, 50=0.05%
lat (msec) : 100=0.05%, 250=0.15%, 500=0.24%, 750=0.29%, 1000=0.41%
lat (msec) : 2000=3.54%, >=2000=95.23%
cpu : usr=0.16%, sys=16.26%, ctx=28707074, majf=0, minf=68196247
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.2%, 32=0.4%, >=64=99.3%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=0,569928,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
WRITE: bw=4744MiB/s (4975MB/s), 4744MiB/s-4744MiB/s (4975MB/s-4975MB/s), io=557GiB (598GB), run=120132-120132msec
Here's my best case with Linux client to NFS leveraging RDMA:
NFS mount options: rsize=1048576,wsize=1048576,vers=4.2,proto=rdma,port=20049,noatime,nodiratime
READ
Server CPU at ~11.5 - 11.8% over the 2 min run for Reads
Switch reported 99911 Mbits/sec, 3010898 packets/sec, 99% of line rate
raw data
READ: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=256
...
fio-3.28
Starting 64 processes
READ: Laying out IO file (1 file / 102400MiB)
Jobs: 64 (f=64): [R(64)][20.7%][r=223MiB/s][r=223 IOPS][eta 07m:46s]
READ: (groupid=0, jobs=64): err= 0: pid=11272: Mon Dec 6 20:11:42 2021
read: IOPS=11.3k, BW=11.0GiB/s (11.8GB/s)(1334GiB/121408msec)
slat (usec): min=24, max=23176, avg=61.64, stdev=241.72
clat (msec): min=118, max=5353, avg=1452.38, stdev=361.26
lat (msec): min=121, max=5353, avg=1452.45, stdev=361.40
clat percentiles (msec):
| 1.00th=[ 1401], 5.00th=[ 1401], 10.00th=[ 1401], 20.00th=[ 1401],
| 30.00th=[ 1401], 40.00th=[ 1401], 50.00th=[ 1401], 60.00th=[ 1401],
| 70.00th=[ 1401], 80.00th=[ 1401], 90.00th=[ 1401], 95.00th=[ 1418],
| 99.00th=[ 4279], 99.50th=[ 4665], 99.90th=[ 5067], 99.95th=[ 5201],
| 99.99th=[ 5336]
bw ( MiB/s): min= 4948, max=11776, per=100.00%, avg=11587.24, stdev= 9.02, samples=14912
iops : min= 4948, max=11776, avg=11587.24, stdev= 9.02, samples=14912
lat (msec) : 250=0.04%, 500=0.03%, 750=0.11%, 2000=97.93%, >=2000=1.89%
cpu : usr=0.03%, sys=1.10%, ctx=1387859, majf=0, minf=7724180
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.7%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=1366234,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
READ: bw=11.0GiB/s (11.8GB/s), 11.0GiB/s-11.0GiB/s (11.8GB/s-11.8GB/s), io=1334GiB (1433GB), run=121408-121408msec
WRITE
Server CPU at 10% with regular spikes to 35% when zfs flushed over the 2 min run for writes
switch reported 98213 Mbits/sec, 2948446 packets/sec, 98% of line rate
– Not sure how this is possible; I’m guessing that ZFS was caching it all (100GB) and it was being updated faster than it could flush
raw data
fio --name write --filename=/mnt/nfs/nas/temp.file --rw=write --size=100g --bs=1024k --ioengine=libaio --iodepth=256 --direct=1 --runtime=120 --time_based --group_reporting --numjobs=64
write: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=256
...
fio-3.28
Starting 64 processes
Jobs: 48 (f=48): [W(33),_(10),W(1),_(3),E(1),_(1),E(1),W(14)][20.9%][w=4076MiB/s][w=4075 IOPS][eta 07m:43s]
write: (groupid=0, jobs=64): err= 0: pid=11787: Mon Dec 6 20:17:49 2021
write: IOPS=11.4k, BW=11.1GiB/s (11.9GB/s)(1351GiB/121445msec); 0 zone resets
slat (usec): min=34, max=23737, avg=87.29, stdev=70.26
clat (msec): min=20, max=2846, avg=1435.77, stdev=134.87
lat (msec): min=20, max=2846, avg=1435.86, stdev=134.84
clat percentiles (msec):
| 1.00th=[ 953], 5.00th=[ 1418], 10.00th=[ 1418], 20.00th=[ 1418],
| 30.00th=[ 1418], 40.00th=[ 1418], 50.00th=[ 1435], 60.00th=[ 1452],
| 70.00th=[ 1452], 80.00th=[ 1469], 90.00th=[ 1485], 95.00th=[ 1485],
| 99.00th=[ 1552], 99.50th=[ 2165], 99.90th=[ 2635], 99.95th=[ 2702],
| 99.99th=[ 2802]
bw ( MiB/s): min= 8727, max=15632, per=100.00%, avg=11393.26, stdev= 9.59, samples=15360
iops : min= 8727, max=15620, avg=11389.15, stdev= 9.58, samples=15360
lat (msec) : 50=0.01%, 100=0.09%, 250=0.27%, 500=0.22%, 750=0.22%
lat (msec) : 1000=0.23%, 2000=98.33%, >=2000=0.63%
cpu : usr=0.69%, sys=0.93%, ctx=1393157, majf=0, minf=5328390
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.7%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1%
issued rwts: total=0,1383709,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=256
Run status group 0 (all jobs):
WRITE: bw=11.1GiB/s (11.9GB/s), 11.1GiB/s-11.1GiB/s (11.9GB/s-11.9GB/s), io=1351GiB (1451GB), run=121445-121445msec