Been doing some FIO testing on a large NAS for a business, this machine has 16 8TB Micron 5300 Pro SATA SSDs in it and has been an absolute monster; but they have a need to get more specific random 4k read IOP performance numbers.
8 vdevs, so 8 x 2 drive mirrors, all in a single pool. System has 256GB of RAM and an EPYC 7281.
I’ve been doing a lot of testing with FIO but the numbers aren’t where I would expect them, I’m thinking there’s something I’m just not understanding and maybe this is totally fine, but am curious if these feel insanely low to anyone else.
According to the spec sheets these drives should be capable of nearly 90k IOPS for 4k random reads on their own, reading from 16 simultaneously in theory should be at least that high.
I’m running FIO with a test file of 1TB (to avoid using ARC for the majority of it), queue depth of 32, 4k block size, random reads, 8 threads (100GB of reads per thread), and letting this run for half an hour. Results are roughly 20k IOPS. I believe this is enough for the specific needs on this machine anyway, but it feels low to me considering what the single performance of a drive should do.
Is this possibly ZFS related or something? It just seems odd since I can get about half a million IOPS from the ARC, so the system itself should be capable of pretty high numbers.
For added info, this is the specific command I am running: fio --name=1T100GoffsetRand4kReadQ32 --filename=test1T.dat --filesize=1T --size=100G --iodepth=32 --numjobs=8 --rw=randread --bs=4k --group_reporting --runtime=30M --offset_increment=100G --output=1T100GoffsetRand4kReadQ32-2.txt
are the drives connected to an HBA or to the MB?
did you check the MB block chart for how the drives are getting to the CPU?
there is several other potential unknowns here, but start with the hardware first.
1 Like
Are you using other config like --ioengine=libaio --direct=1
to get as much async as the host kernel will allow?
K3n.
1 Like
libaio isn’t supported on BSD (this is TrueNAS Core, I should’ve mentioned this sorry). I am using direct=1 but that’s not supported on BSD either so doesn’t really matter.
I am looking for sync either way but it should still be much faster than this, at least as far as I can theorize.
This is a proper enterprise server, it’s setup with an HBA and there are 2 HBAs so you can get full bandwidth from each disk.
Hardware really should be very solid and I don’t have numbers off hand right now, but sequentials are insanely fast (even past what the ARC would cache, and for both reads and writes)
I could be missing the mark here, it’s possible 4k randoms from a ZFS system like this are just this slow, 20k IOPS sustained with large files isn’t anything to laugh at for sure, I just feel like it’s slower than it should be.
To be completely clear, this machine has been in production use for about 2 years and has been an absolute beast without a single issue the entire time, handling high throughput workloads at the same time, etc… I just never benchmarked it, but since we plan to put a database load on it (on top of everything else), I wanted to get some exact figures for the vendor making the software that’ll use this.
All in all, I think it’s going to meet what we need, but I’m a little surprised by the low numbers so I’m trying to figure out if I’m missing something obvious.
Just to add a little more info for @k3ninho @Zedicus
I just did a test on this, identical command for FIO except with sequential reads instead of randoms and sustained 431K IOPS the entire time.
So maybe this is just expected/where it should be for randoms for this machine.
I just really want to understand the root cause of this “low” performance lol.
1 Like
OK another update, I’ve also taken the time to double check and both the HBAs in use here are at 8GT/s and x8, so that’s not the limit either.
unless you did some customization and tuning for a specific block size and data type, by default ZFS tries to have acceptable performance with they peak performance being a fairly generic file size (around 1gb, also depending).
short of going through every setup piece individually, i would say your build is ‘normal’.
Yeah I’ve been getting some feedback from a few other places (Reddit etc) and am starting to come to the conclusion that this is about as expected performance wise.
I am going to mess around with recordsize a bit to see if I can improve things, since default is 128K, but yeah this seems about normal now.
1 Like