So I went ahead and tried various tweaks and changes last night. No luck, still seeing really slow speeds on the host and the VM’s. Just was super frustrated at this point… I even ran out to MicroCenter today and grabbed a ironwolf 125 SSD to add as a SLOG (I wanted this anyways for my ZFS NAS VM, but figured I’d try it on the host first to see). That made zero impact on the host, same issues.
Today I backed backed up all the VM’s and decided to try a different filesystem so I could get some tests going to compare vs my ZFS struggles. I thought perhaps I had a bad drive or a config issue.
I setup raid 1 via madam w/XFS as the filesystem for the two NVME drives and re-installed proxmox. Re-imported all the VM’s and up and running again. First test was pulling data off my TrueNas Scale VM to another VM via NFS. Before I’d see maybe 150-200, now I am seeing a consistent 350-400+ ex: 14.6% (9.6 GiB / 65.4 GiB) 420.6 MiB/s remaining 0:02:15. Same goes for copying a file within a VM as well as copying files on the host, everything is fast as it should be, ex: 32.6% (12.0 GiB / 36.8 GiB) 915.9 MiB/s remaining 0:00:27.
I re-ran some of those fio tests on the host as well
Random write:
root@proxmox:~# fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=® 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=posixaio, iodepth=1
fio-3.12
Starting 1 process
random-write: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [F(1)][100.0%][w=53.0MiB/s][w=13.8k IOPS][eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=1474: Mon Mar 15 18:55:27 2021
write: IOPS=117k, BW=458MiB/s (481MB/s)(27.7GiB/61761msec); 0 zone resets
slat (nsec): min=250, max=1905.6k, avg=893.55, stdev=761.48
clat (nsec): min=550, max=1028.1k, avg=5132.49, stdev=1666.20
lat (usec): min=3, max=1907, avg= 6.03, stdev= 1.94
clat percentiles (nsec):
| 1.00th=[ 2544], 5.00th=[ 2640], 10.00th=[ 2672], 20.00th=[ 2768],
| 30.00th=[ 5472], 40.00th=[ 5792], 50.00th=[ 5920], 60.00th=[ 5984],
| 70.00th=[ 6112], 80.00th=[ 6240], 90.00th=[ 6432], 95.00th=[ 6624],
| 99.00th=[ 7136], 99.50th=[ 7520], 99.90th=[ 8896], 99.95th=[ 9664],
| 99.99th=[11328]
bw ( KiB/s): min=20792, max=1090624, per=100.00%, avg=585781.51, stdev=238168.11, samples=98
iops : min= 5198, max=272656, avg=146445.40, stdev=59542.06, samples=98
lat (nsec) : 750=0.01%
lat (usec) : 4=28.68%, 10=71.28%, 20=0.03%, 50=0.01%, 100=0.01%
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%
lat (msec) : 2=0.01%
cpu : usr=14.35%, sys=25.54%, ctx=7392187, majf=0, minf=45
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,7248968,0,1 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=458MiB/s (481MB/s), 458MiB/s-458MiB/s (481MB/s-481MB/s), io=27.7GiB (29.7GB), run=61761-61761msec
Read:
root@proxmox:~# fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=testio --filename=testio --bs=4k --iodepth=64 --size=4G --readwrite=randread
testio: (g=0): rw=randread, bs=® 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=64
fio-3.12
Starting 1 process
testio: Laying out IO file (1 file / 4096MiB)
Jobs: 1 (f=1): [r(1)][100.0%][r=1153MiB/s][r=295k IOPS][eta 00m:00s]
testio: (groupid=0, jobs=1): err= 0: pid=1850: Mon Mar 15 18:56:55 2021
read: IOPS=296k, BW=1158MiB/s (1214MB/s)(4096MiB/3538msec)
bw ( MiB/s): min= 1144, max= 1166, per=100.00%, avg=1158.00, stdev= 7.91, samples=7
iops : min=293086, max=298528, avg=296448.86, stdev=2024.47, samples=7
cpu : usr=19.45%, sys=77.83%, ctx=12705, majf=0, minf=71
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued rwts: total=1048576,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=64
Run status group 0 (all jobs):
READ: bw=1158MiB/s (1214MB/s), 1158MiB/s-1158MiB/s (1214MB/s-1214MB/s), io=4096MiB (4295MB), run=3538-3538msec
Disk stats (read/write):
dm-1: ios=1036134/0, merge=0/0, ticks=143644/0, in_queue=143644, util=97.38%, aggrios=1048576/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
md0: ios=1048576/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=527403/3117, aggrmerge=3118/3118, aggrticks=73058/186, aggrin_queue=0, aggrutil=96.12%
nvme0n1: ios=412398/6232, merge=0/6236, ticks=73881/373, in_queue=0, util=96.12%
nvme1n1: ios=642408/2, merge=6236/0, ticks=72235/0, in_queue=0, util=96.12%
Super happy with how things are now seeing I am getting the speeds and performance I should be. Not ruling out there was something up with my config but I feel like I gave it a solid try and something wasn’t adding up.
Thanks!
John