My system has 8x 16GB DDR4 sticks running at 3200MT/s. The CPU/X99 chipset has quad channel support.
Hmm.
So my system is
DDR3 1600
so 800x2x64=102.4 Gbps and 4 memory channels so 409.6 Gbps and two CPUs so 819.2 Gbps
Yours is
DDR4 3200
so 1600x2x64= 2044.4 Gbps, and 4 memory channels so the same 819.2 Gbps.
Not a memory bottleneck.
CPU you are at a passmark score of 10306 and single thread at 2056
’
Mine is passmark score of 23659 and single thread rating of 1744
Since the FIO test is only running 12 jobs, I don’t think it’s a CPU bottleneck, running the test confirms I have less than 50% usage
If I change my FIO parameters and do 24 jobs instead of 12, overall CPU utilization actually goes down, at least according to the UI in TrueNAS
and my score is actually lower:
If it’s not memory and it’s not CPU, perhaps somewhere in the PCI-E bus you are being bottlenecked with a contention issue.
Are you maybe in an X8 Slot?? That’s my only guess, but your MD results seem to indicate otherwise…
Or maybe IX did some tuning on the CPU scheduler or in ZFS to improve I/O??
Well, let’s get into it…
First, let’s set the performance governor. A long time ago I tried and couldn’t find significant performance differences from using the default governor. Also, regular tests on phoronix.com show only small benefits of using the performance governor.
But - it will hopefully lead to more consistency and better comparability…
[test]# echo "performance" | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Next, let’s find out if the PLX is the bottleneck.
Let’s check on usage and speed of PCIe lanes:
[test]#lspci -vv
[...]
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L1, Exit Latency L1 <4us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
[...]
That’s all ok. Btw. all SSDs connect with 8GT/s, Width x4.
Next, I will run fio on a single SSD, then I’ll run it on all in parallel to see if there is a significant slowdown indicating a bottleneck.
Single nvme (in the interest of space just the summary):
Run status group 0 (all jobs):
READ: bw=1345MiB/s (1410MB/s), 1345MiB/s-1345MiB/s (1410MB/s-1410MB/s), io=78.8GiB (84.7GB), run=60037-60037msec
WRITE: bw=1345MiB/s (1410MB/s), 1345MiB/s-1345MiB/s (1410MB/s-1410MB/s), io=78.9GiB (84.7GB), run=60037-60037msec
Disk stats (read/write):
nvme0n1: ios=752890/753400, merge=0/28, ticks=736131/82479, in_queue=818661, util=99.88%
Next mounting each SSD into separate folders and running fio on those in parallel.
nvme0:
Run status group 0 (all jobs):
READ: bw=1329MiB/s (1393MB/s), 1329MiB/s-1329MiB/s (1393MB/s-1393MB/s), io=77.9GiB (83.7GB), run=60045-60045msec
WRITE: bw=1329MiB/s (1394MB/s), 1329MiB/s-1329MiB/s (1394MB/s-1394MB/s), io=77.9GiB (83.7GB), run=60045-60045msec
Disk stats (read/write):
nvme0n1: ios=744652/745186, merge=0/28, ticks=738464/80839, in_queue=819355, util=99.87%
nvme1:
Run status group 0 (all jobs):
READ: bw=1303MiB/s (1367MB/s), 1303MiB/s-1303MiB/s (1367MB/s-1367MB/s), io=76.4GiB (82.1GB), run=60041-60041msec
WRITE: bw=1304MiB/s (1367MB/s), 1304MiB/s-1304MiB/s (1367MB/s-1367MB/s), io=76.4GiB (82.1GB), run=60041-60041msec
Disk stats (read/write):
nvme1n1: ios=730371/730887, merge=0/28, ticks=744813/75752, in_queue=820617, util=99.89%
nvme2:
Run status group 0 (all jobs):
READ: bw=1318MiB/s (1382MB/s), 1318MiB/s-1318MiB/s (1382MB/s-1382MB/s), io=77.3GiB (82.9GB), run=60040-60040msec
WRITE: bw=1318MiB/s (1382MB/s), 1318MiB/s-1318MiB/s (1382MB/s-1382MB/s), io=77.3GiB (83.0GB), run=60040-60040msec
Disk stats (read/write):
nvme2n1: ios=738720/739259, merge=0/28, ticks=740512/80201, in_queue=820763, util=99.87%
nvme3:
Run status group 0 (all jobs):
READ: bw=1308MiB/s (1372MB/s), 1308MiB/s-1308MiB/s (1372MB/s-1372MB/s), io=76.7GiB (82.4GB), run=60042-60042msec
WRITE: bw=1309MiB/s (1372MB/s), 1309MiB/s-1309MiB/s (1372MB/s-1372MB/s), io=76.7GiB (82.4GB), run=60042-60042msec
Disk stats (read/write):
nvme3n1: ios=732698/733257, merge=0/28, ticks=744700/76400, in_queue=821150, util=99.89%
CPU utilization observed:
avg-cpu: %user %nice %system %iowait %steal %idle
4.13 0.00 10.02 85.22 0.00 0.63
Looks good to me. I think the test is valid.
In aggregate the total throughput of all 4 SSDs is ~ 5500MB/s. Performance for each SSD is comparable to that of a single SSD.
I conclude that the PLX card is not the bottleneck.
Well, I have yet to find satisfying memory benchmarks in Linux.
Here are the passmark scores:
Memory Mark: 2698
Database Operations 4769 Thousand Operations/s
Memory Read Cached 26588 MB/s
Memory Read Uncached 12111 MB/s
Memory Write 10336 MB/s
Available RAM 103305 Megabytes
Memory Latency 49 Nanoseconds
Memory Threaded 44613 MB/s
My machine is somewhat overclocked the cpu scoring higher in passmark than what you looked up:
CPU Mark: 11459
Integer Math 39425 Million Operations/s
Floating Point Math 19881 Million Operations/s
Prime Numbers 50.9 Million Primes/s
Sorting 25463 Thousand Strings/s
Encryption 3754 MB/s
Compression 170204 KB/s
CPU Single Threaded 2239 Million Operations/s
Physics 901 Frames/s
Extended Instructions (SSE) 8869 Million Matrices/s
Sysbench results:
Random read: 1917.06 MiB/sec
Seq read: 8177.03 MiB/sec
Random write: 1914.93 MiB/sec
Seq write: 6541.37 MiB/sec
Man, those numbers look way worse than passmark. Please let me know if you have a tool more like fio which allows understanding memory performance rather than posting a single result score.
I have quite an interesting update.
The passmark scores did not sit well with me once I compared them to tests with identical memory sticks.
Then I remembered fiddling in the bios when I couldn’t get it to post with the set of PCIe cards that I had in there at the time. Sure enough, the memory timing was off and running at the default 2166MT/s.
I updated the memory timings in the BIOS and here are the new numbers:
Drumroll…
from:
to
Memory Mark: 3025
Database Operations 5511 Thousand Operations/s
Memory Read Cached 27849 MB/s
Memory Read Uncached 13401 MB/s
Memory Write 11368 MB/s
Available RAM 100069 Megabytes
Memory Latency 41 Nanoseconds
Memory Threaded 50155 MB/s
This also had a significant impact on the CPU scores:
CPU Mark: 13378
Integer Math 45994 Million Operations/s
Floating Point Math 23222 Million Operations/s
Prime Numbers 58.2 Million Primes/s
Sorting 29604 Thousand Strings/s
Encryption 4377 MB/s
Compression 199381 KB/s
CPU Single Threaded 2577 Million Operations/s
Physics 1040 Frames/s
Extended Instructions (SSE) 10610 Million Matrices/s
Not leaving any stone unturned, I also changed some ZFS parameters to tune it more towards SSD use:
zpool
Property Value
ashift 12
zfs
Property Value
atime off
Here are the updated fio test results:
fio-3.26
Starting 12 processes
Jobs: 12 (f=12): [m(12)][100.0%][r=4951MiB/s,w=4983MiB/s][r=39.6k,w=39.9k IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=167065: Sat Aug 13 17:41:06 2022
read: IOPS=38.4k, BW=4805MiB/s (5038MB/s)(282GiB/60009msec)
bw ( MiB/s): min= 2850, max= 7262, per=100.00%, avg=4805.69, stdev=71.70, samples=1416
iops : min=22804, max=58098, avg=38443.06, stdev=573.64, samples=1416
write: IOPS=38.5k, BW=4810MiB/s (5044MB/s)(282GiB/60009msec); 0 zone resets
bw ( MiB/s): min= 2860, max= 7378, per=100.00%, avg=4810.96, stdev=71.84, samples=1416
iops : min=22878, max=59026, avg=38484.51, stdev=574.82, samples=1416
cpu : usr=4.23%, sys=0.87%, ctx=2134685, majf=0, minf=700
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=11.9%, 16=63.7%, 32=24.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=95.9%, 8=1.0%, 16=1.3%, 32=1.8%, 64=0.0%, >=64=0.0%
issued rwts: total=2306471,2309171,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: bw=4805MiB/s (5038MB/s), 4805MiB/s-4805MiB/s (5038MB/s-5038MB/s), io=282GiB (302GB), run=60009-60009msec
WRITE: bw=4810MiB/s (5044MB/s), 4810MiB/s-4810MiB/s (5044MB/s-5044MB/s), io=282GiB (303GB), run=60009-60009msec
So, @NicKF you initial hunch to look into memory performance was spot on!
Second observation: the numbers mostly match yours. Same zfs 2 mirror config, no Optane, but PCIe Gen3.
I kinda expected to see higher performance assuming your rig was bottlenecked by PCIe Gen2.
Here is some data collected by iostat observed during the run:
avg-cpu: %user %nice %system %iowait %steal %idle
5.06 0.00 94.44 0.03 0.00 0.47
Device r/s rMB/s rrqm/s %rrqm r_await rareq-sz w/s wMB/s wrqm/s %wrqm w_await wareq-sz d/s dMB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
nvme0n1 4.00 0.00 0.00 0.00 1.85 0.00 16115.40 1995.53 0.00 0.00 0.11 126.80 0.00 0.00 0.00 0.00 0.00 0.00 4.00 1.85 1.86 89.56
nvme1n1 4.00 0.00 0.00 0.00 1.80 0.00 15392.20 1904.96 0.00 0.00 0.11 126.73 0.00 0.00 0.00 0.00 0.00 0.00 4.00 1.85 1.77 87.76
nvme2n1 4.00 0.00 0.00 0.00 1.80 0.00 16107.40 1995.30 0.00 0.00 0.12 126.85 0.00 0.00 0.00 0.00 0.00 0.00 4.00 1.75 1.92 89.58
nvme3n1 4.00 0.00 0.00 0.00 1.80 0.00 15324.60 1898.08 0.00 0.00 0.15 126.83 0.00 0.00 0.00 0.00 0.00 0.00 4.00 1.80 2.30 91.96
Here I see that my rig is clearly bottlenecked by CPU, specifically system processes trying to process all these 128k blocks.
Second observation is that I can only see write operations, although the test clearly is a “randrw” test and the results clearly show read and write results.
All data seems to be read from ZFS ARC, therefore there are no read operations on the disk. In addition there seem to be some write caching because the write bandwidth of two of the four SSDs total around 4GB/s, not the 5GB/s that fio reports as a result.
Last indicator that we’re mostly looking at memory performance is that read and write performance is almost identical, which is not to be expected with these NAND SSDs (write should be somewhat slower).
The only conclusion I have is that we’ve only barely tested the SSDs, but rather the memory performance of our rigs.
Hmmm.
I think this may be inline with some of the information @wendell has shared in some of his testing over the years.
(151) Record Breaker: Toward 20 million i/ops on the desktop with Threadripper Pro - YouTube
When he is pushing PCIe Gen4 on ThreadRipper, I think it’s pretty obvious that both in terms of memory and CPU, that is really what’s holding us back.
I’m going to order this:
Linkreal PCIe 3.0 X16 to Quad M.2 NVMe SSD Swtich Adapter Card for Servers LRNV9547L 4I|Add On Cards| - AliExpress
To see if I can squeeze some more performance out.
These are the two differant PLX chips
I don’t know if an improvement in PCI-E alone will assist, but it’s not going to hurt (beyond my wallet) to try… I will update this once I receive them in a few weeks, obviosly have to wait for shipping from China
Reviewed content and with the knowledge gained from our testing could understand it better than the first time.
I could not find the actual fio command Wendell ran during these tests. But later in the thread Chuntzu documented his test cases well.
They were going for max io (instead of max bandwidth in our test). So, the main difference I saw was that they used the smallest block size (4k) and more importantly Chuntzu used a different ioengine.
Let’s try the ioengine. I start by formatting a partition on each 970 pro into ext4 and mounting them.
First, test with our current command and setup:
[test]# fio --bs=128k --direct=1 --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based --directory=/mnt/nvme0:/mnt/nvme1:/mnt/nvme2:/mnt/nvme3
randrw: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=posixaio, iodepth=32
...
fio-3.26
Starting 12 processes
Jobs: 12 (f=12): [m(12)][100.0%][r=3329MiB/s,w=3342MiB/s][r=26.6k,w=26.7k IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=19615: Sat Aug 13 22:37:10 2022
read: IOPS=26.9k, BW=3364MiB/s (3527MB/s)(197GiB/60022msec)
bw ( MiB/s): min= 3077, max= 3762, per=100.00%, avg=3366.47, stdev=12.98, samples=1430
iops : min=24617, max=30102, avg=26931.42, stdev=103.81, samples=1430
write: IOPS=27.0k, BW=3370MiB/s (3533MB/s)(198GiB/60022msec); 0 zone resets
bw ( MiB/s): min= 2987, max= 3863, per=100.00%, avg=3372.22, stdev=15.08, samples=1430
iops : min=23902, max=30911, avg=26977.45, stdev=120.64, samples=1430
cpu : usr=1.42%, sys=0.36%, ctx=804426, majf=0, minf=701
IO depths : 1=0.0%, 2=0.0%, 4=0.1%, 8=25.0%, 16=50.0%, 32=25.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=97.5%, 8=0.0%, 16=0.0%, 32=2.5%, 64=0.0%, >=64=0.0%
issued rwts: total=1615001,1617829,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: bw=3364MiB/s (3527MB/s), 3364MiB/s-3364MiB/s (3527MB/s-3527MB/s), io=197GiB (212GB), run=60022-60022msec
WRITE: bw=3370MiB/s (3533MB/s), 3370MiB/s-3370MiB/s (3533MB/s-3533MB/s), io=198GiB (212GB), run=60022-60022msec
Using Wendell’s command to monitor performance:
---procs--- ----total-usage---- ------memory-usage----- --io/total- ----system----
run blk new|usr sys idl wai stl| used free buf cach| read writ| time
0 11 3.0| 3 4 20 71 0| 22G 101G 187M 1726M|26.9k 27.0k|13-08 22:36:54
2.0 10 0| 3 4 21 71 0| 22G 101G 187M 1726M|26.4k 26.4k|13-08 22:36:55
1.0 10 0| 3 4 19 72 0| 22G 101G 187M 1726M|25.9k 25.8k|13-08 22:36:56
3.0 10 0| 4 4 20 72 0| 22G 101G 187M 1726M|26.1k 26.4k|13-08 22:36:57
1.0 11 0| 3 4 19 72 0| 22G 101G 187M 1726M|26.2k 26.4k|13-08 22:36:58
1.0 11 0| 3 4 20 71 0| 22G 101G 187M 1726M|26.4k 26.5k|13-08 22:36:59
0 12 0| 3 4 18 73 0| 22G 101G 188M 1726M|26.0k 26.4k|13-08 22:37:00
0 12 0| 3 4 20 72 0| 22G 101G 188M 1726M|26.2k 26.1k|13-08 22:37:01
3.0 10 0| 3 4 19 73 0| 22G 101G 188M 1726M|26.7k 26.5k|13-08 22:37:02
1.0 11 0| 3 4 17 74 0| 22G 101G 188M 1726M|26.7k 26.6k|13-08 22:37:03
0 13 0| 3 4 20 72 0| 22G 101G 188M 1726M|26.8k 26.8k|13-08 22:37:04
0 12 0| 3 4 19 72 0| 22G 101G 188M 1726M|26.4k 26.9k|13-08 22:37:05
0 12 0| 3 4 19 72 0| 22G 101G 188M 1726M|26.5k 26.4k|13-08 22:37:06
0 12 0| 3 4 20 71 0| 22G 101G 188M 1726M|26.4k 26.5k|13-08 22:37:07
2.0 11 0| 3 4 20 72 0| 22G 101G 188M 1726M|26.4k 26.6k|13-08 22:37:08
1.0 12 0| 3 4 20 71 0| 22G 101G 188M 1726M|26.8k 26.8k|13-08 22:37:09
Observation: pretty consistent 26k io, 71% wait time, 20% idle.
Let’s switch ioengine from “posixaio” to “aio”
[test]# fio --bs=128k --direct=1 --gtod_reduce=1 --ioengine=aio --iodepth=32 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based --directory=/mnt/nvme0:/mnt/nvme1:/mnt/nvme2:/mnt/nvme3
randrw: (g=0): rw=randrw, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=libaio, iodepth=32
...
fio-3.26
Starting 12 processes
Jobs: 12 (f=12): [m(12)][100.0%][r=5244MiB/s,w=5320MiB/s][r=41.9k,w=42.6k IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=19694: Sat Aug 13 22:42:23 2022
read: IOPS=42.6k, BW=5331MiB/s (5590MB/s)(312GiB/60015msec)
bw ( MiB/s): min= 4932, max= 5745, per=100.00%, avg=5339.46, stdev=15.05, samples=1428
iops : min=39457, max=45960, avg=42714.58, stdev=120.43, samples=1428
write: IOPS=42.7k, BW=5338MiB/s (5598MB/s)(313GiB/60015msec); 0 zone resets
bw ( MiB/s): min= 4895, max= 5739, per=100.00%, avg=5346.25, stdev=15.19, samples=1428
iops : min=39159, max=45915, avg=42768.93, stdev=121.54, samples=1428
cpu : usr=5.30%, sys=16.02%, ctx=4366045, majf=0, minf=700
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=2559394,2562837,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: bw=5331MiB/s (5590MB/s), 5331MiB/s-5331MiB/s (5590MB/s-5590MB/s), io=312GiB (335GB), run=60015-60015msec
WRITE: bw=5338MiB/s (5598MB/s), 5338MiB/s-5338MiB/s (5598MB/s-5598MB/s), io=313GiB (336GB), run=60015-60015msec
Wow! That’s much more performance! And the utilization?
---procs--- ----total-usage---- ------memory-usage----- --io/total- ----system----
run blk new|usr sys idl wai stl| used free buf cach| read writ| time
7.0 0 0| 11 17 66 0 0| 22G 101G 193M 1726M|42.6k 42.5k|13-08 22:42:14
4.0 0 0| 12 17 66 0 0| 22G 101G 193M 1726M|42.9k 43.1k|13-08 22:42:15
1.0 0 0| 11 17 66 0 0| 22G 101G 193M 1726M|43.5k 43.3k|13-08 22:42:16
3.0 0 0| 12 17 66 0 0| 22G 101G 193M 1726M|42.9k 43.3k|13-08 22:42:17
3.0 0 5.0| 12 17 66 0 0| 22G 101G 193M 1726M|43.6k 43.2k|13-08 22:42:18
2.0 0 0| 11 16 66 0 0| 22G 101G 193M 1726M|42.7k 42.9k|13-08 22:42:19
4.0 0 0| 11 16 67 0 0| 22G 101G 193M 1726M|42.4k 42.4k|13-08 22:42:20
3.0 0 0| 11 16 67 0 0| 22G 101G 193M 1726M|41.8k 42.0k|13-08 22:42:21
5.0 0 0| 11 17 67 0 0| 22G 101G 193M 1726M|42.3k 42.3k|13-08 22:42:22
4.0 0 0| 11 16 67 0 0| 22G 101G 193M 1726M|41.9k 41.9k|13-08 22:42:23
0 0 0| 5 9 83 0 0| 22G 101G 193M 1721M|21.4k 21.7k|13-08 22:42:24
No wait time, about 30% CPU utilization (hmm - a few percent are missing…). IOs are up to ~42k reads and writes.
Nice!
Man! The ioengine used in our tests was a serious bottleneck.
Just for giggles: what’s the max io we can get from this setup?
Changing to block size 4k
]# fio --bs=4k --direct=1 --gtod_reduce=1 --ioengine=aio --iodepth=32 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based --directory=/mnt/nvme0:/mnt/nvme1:/mnt/nvme2:/mnt/nvme3
randrw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
...
fio-3.26
Starting 12 processes
Jobs: 12 (f=6): [m(2),f(1),m(1),f(2),m(1),f(2),m(1),f(2)][100.0%][r=2915MiB/s,w=2920MiB/s][r=746k,w=748k IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=20126: Sat Aug 13 23:07:54 2022
read: IOPS=748k, BW=2920MiB/s (3062MB/s)(171GiB/60001msec)
bw ( MiB/s): min= 2728, max= 3031, per=100.00%, avg=2922.61, stdev= 3.89, samples=1434
iops : min=698602, max=775935, avg=748185.33, stdev=996.78, samples=1434
write: IOPS=748k, BW=2921MiB/s (3063MB/s)(171GiB/60001msec); 0 zone resets
bw ( MiB/s): min= 2732, max= 3039, per=100.00%, avg=2923.01, stdev= 3.86, samples=1434
iops : min=699577, max=778066, avg=748289.12, stdev=987.63, samples=1434
cpu : usr=19.39%, sys=51.05%, ctx=38317315, majf=0, minf=700
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=44857655,44863973,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: bw=2920MiB/s (3062MB/s), 2920MiB/s-2920MiB/s (3062MB/s-3062MB/s), io=171GiB (184GB), run=60001-60001msec
WRITE: bw=2921MiB/s (3063MB/s), 2921MiB/s-2921MiB/s (3063MB/s-3063MB/s), io=171GiB (184GB), run=60001-60001msec
That’s 1.5M iops: 750k read and 750k write. Total bandwidth: about 3GB/s read and write.
A quick look at the resources:
---procs--- ----total-usage---- ------memory-usage----- --io/total- ----system----
run blk new|usr sys idl wai stl| used free buf cach| read writ| time
14 0 0| 39 60 1 0 0| 22G 101G 216M 1726M| 750k 750k|13-08 23:06:58
14 0 0| 38 61 1 0 0| 22G 101G 216M 1726M| 749k 750k|13-08 23:06:59
15 0 3.0| 38 60 2 0 0| 22G 101G 216M 1726M| 747k 746k|13-08 23:07:00
16 0 0| 38 61 1 0 0| 22G 101G 216M 1726M| 752k 752k|13-08 23:07:01
13 0 0| 38 60 2 0 0| 22G 101G 216M 1726M| 747k 748k|13-08 23:07:02
16 0 0| 38 60 1 0 0| 22G 101G 216M 1726M| 749k 751k|13-08 23:07:03
14 1.0 0| 38 60 2 0 0| 22G 101G 217M 1726M| 747k 747k|13-08 23:07:04
15 0 0| 38 60 2 0 0| 22G 101G 217M 1726M| 750k 748k|13-08 23:07:05
13 0 0| 38 61 1 0 0| 22G 101G 217M 1726M| 751k 752k|13-08 23:07:06
Now all the CPU resources are used, only -2% idle, no wait time.
Can we wring even more performance out of the SSDs? Can we use that brand new efficient io_uring kernel structure? Sure!
[test]# fio --bs=4k --direct=1 --gtod_reduce=1 --ioengine=io_uring --iodepth=32 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based --directory=/mnt/nvme0:/mnt/nvme1:/mnt/nvme2:/mnt/nvme3
randrw: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=32
...
fio-3.26
Starting 12 processes
Jobs: 12 (f=12): [m(12)][100.0%][r=3180MiB/s,w=3177MiB/s][r=814k,w=813k IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=20034: Sat Aug 13 23:03:01 2022
read: IOPS=820k, BW=3202MiB/s (3358MB/s)(188GiB/60001msec)
bw ( MiB/s): min= 2988, max= 3328, per=100.00%, avg=3204.54, stdev= 5.03, samples=1440
iops : min=764945, max=852100, avg=820359.95, stdev=1288.78, samples=1440
write: IOPS=820k, BW=3203MiB/s (3358MB/s)(188GiB/60001msec); 0 zone resets
bw ( MiB/s): min= 2985, max= 3323, per=100.00%, avg=3205.01, stdev= 5.01, samples=1440
iops : min=764213, max=850812, avg=820481.23, stdev=1283.09, samples=1440
cpu : usr=14.24%, sys=53.69%, ctx=40252736, majf=0, minf=699
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=49185925,49193186,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: bw=3202MiB/s (3358MB/s), 3202MiB/s-3202MiB/s (3358MB/s-3358MB/s), io=188GiB (201GB), run=60001-60001msec
WRITE: bw=3203MiB/s (3358MB/s), 3203MiB/s-3203MiB/s (3358MB/s-3358MB/s), io=188GiB (201GB), run=60001-60001msec
That’s >1.6M iops: 820k read and 820k write. Over 3.3GB/s read and write.
---procs--- ----total-usage---- ------memory-usage----- --io/total- ----system----
run blk new|usr sys idl wai stl| used free buf cach| read writ| time
14 0 0| 33 64 3 0 0| 22G 101G 210M 1726M| 824k 824k|13-08 23:01:58
18 0 3.0| 34 63 3 0 0| 22G 101G 210M 1726M| 823k 824k|13-08 23:01:59
15 0 0| 32 63 4 0 0| 22G 101G 210M 1726M| 817k 818k|13-08 23:02:00
15 0 0| 33 64 3 0 0| 22G 101G 210M 1726M| 821k 821k|13-08 23:02:01
14 0 0| 34 63 3 0 0| 22G 101G 210M 1726M| 817k 813k|13-08 23:02:02
14 0 0| 33 63 3 0 0| 22G 101G 210M 1726M| 824k 823k|13-08 23:02:03
14 0 0| 34 62 3 0 0| 22G 101G 210M 1726M| 816k 817k|13-08 23:02:04
14 0 0| 34 63 3 0 0| 22G 101G 210M 1726M| 826k 824k|13-08 23:02:05
12 0 0| 33 63 3 0 0| 22G 101G 211M 1726M| 819k 818k|13-08 23:02:06
15 0 0| 34 63 2 0 0| 22G 101G 211M 1726M| 823k 823k|13-08 23:02:07
15 0 0| 33 63 3 0 0| 22G 101G 211M 1726M| 822k 825k|13-08 23:02:08
15 0 0| 33 63 3 0 0| 22G 101G 211M 1726M| 818k 818k|13-08 23:02:09
Still all CPU resources consumed.
What happens when we change our test to read-only - same as Wendell did?
]# fio --bs=4k --direct=1 --gtod_reduce=1 --ioengine=io_uring --iodepth=32 --group_reporting --name=randrw --numjobs=12 --ramp_time=10 --runtime=60 --rw=randread --size=256M --time_based --directory=/mnt/nvme0:/mnt/nvme1:/mnt/nvme2:/mnt/nvme3
randrw: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=32
...
fio-3.26
Starting 12 processes
Jobs: 12 (f=11): [r(11),f(1)][100.0%][r=8118MiB/s][r=2078k IOPS][eta 00m:00s]
randrw: (groupid=0, jobs=12): err= 0: pid=20215: Sat Aug 13 23:15:04 2022
read: IOPS=2076k, BW=8108MiB/s (8502MB/s)(475GiB/60001msec)
bw ( MiB/s): min= 8031, max= 8164, per=100.00%, avg=8115.97, stdev= 1.79, samples=1439
iops : min=2056007, max=2090046, avg=2077685.78, stdev=459.40, samples=1439
cpu : usr=16.58%, sys=53.25%, ctx=11217210, majf=0, minf=701
IO depths : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=100.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=124546903,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
READ: bw=8108MiB/s (8502MB/s), 8108MiB/s-8108MiB/s (8502MB/s-8502MB/s), io=475GiB (510GB), run=60001-60001msec
Wow! over 2M read iops, over 8GB/s bandwidth. Now, your PCIe Gen2 slot would be bottlenecked.
---procs--- ----total-usage---- ------memory-usage----- --io/total- ----system----
run blk new|usr sys idl wai stl| used free buf cach| read writ| time
10 0 0| 37 51 5 0 0| 22G 101G 224M 1726M|2076k 0 |13-08 23:14:28
12 0 0| 36 52 5 0 0| 22G 101G 224M 1726M|2076k 0 |13-08 23:14:29
12 0 0| 36 52 5 0 0| 22G 101G 224M 1726M|2077k 0 |13-08 23:14:30
10 0 0| 36 52 5 0 0| 22G 101G 224M 1726M|2078k 0 |13-08 23:14:31
11 0 0| 36 52 5 0 0| 22G 101G 224M 1726M|2076k 0 |13-08 23:14:32
9.0 0 0| 36 52 5 0 0| 22G 101G 224M 1726M|2078k 0 |13-08 23:14:33
12 0 0| 36 52 5 0 0| 22G 101G 224M 1726M|2079k 0 |13-08 23:14:34
11 0 0| 36 52 5 0 0| 22G 101G 224M 1726M|2076k 1.00 |13-08 23:14:35
Interesting: still almost full CPU utilization, but with a higher ratio in user space.
There you have it: inefficiencies in ioengine (posixaio) and file system (zfs) limit performance.
There is a known bottleneck within the ARC that prevents NVMe drives to utilize all their bandwidth. There is a feature update planned to bypass the ARC on NVMe pools. But don’t expect that feature anytime soon…seems like the team ran into some troubles.
Hi All,
For shits and giggles, I tested 2 512GB Samsung 9A1 (980 Pro OEM) in an Alderlake box with an i5-12600T and 2x8GB of DDR5 4800 in a ZFS Mirror.
These are Gen4 SSDs and are generally very fast. It’s probable that the 9A1 isn’t quite as fast as the 980 pro, but here’s a snippet from TomsHardware
This is on TrueNAS Scale Bluefin nightly (because Kernel 5.15 vs 5.10 on current)
randrw: (groupid=0, jobs=12): err= 0: pid=18063: Mon Aug 15 17:19:06 2022
read: IOPS=16.4k, BW=2047MiB/s (2147MB/s)(120GiB/60043msec)
bw ( MiB/s): min= 509, max= 5340, per=100.00%, avg=2056.25, stdev=120.31, samples=1428
iops : min= 4078, max=42723, avg=16449.22, stdev=962.42, samples=1428
write: IOPS=16.4k, BW=2049MiB/s (2149MB/s)(120GiB/60043msec); 0 zone resets
bw ( MiB/s): min= 603, max= 5435, per=100.00%, avg=2058.03, stdev=120.31, samples=1428
iops : min= 4828, max=43487, avg=16463.45, stdev=962.40, samples=1428
cpu : usr=1.13%, sys=0.16%, ctx=608171, majf=1, minf=706
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=17.5%, 16=57.8%, 32=24.3%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=96.9%, 8=0.4%, 16=0.5%, 32=2.2%, 64=0.0%, >=64=0.0%
issued rwts: total=983260,984131,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32Run status group 0 (all jobs):
READ: bw=2047MiB/s (2147MB/s), 2047MiB/s-2047MiB/s (2147MB/s-2147MB/s), io=120GiB (129GB), run=60043-60043msec
WRITE: bw=2049MiB/s (2149MB/s), 2049MiB/s-2049MiB/s (2149MB/s-2149MB/s), io=120GiB (129GB), run=60043-60043msec
root@truenas[~]#
Actually quite slow, surprisingly. I had expected to at least match, if not exceed my previous testing. Hmmmm. ZFS is fun
CPU utilization is very high:
Additional testing yields worse results:
Disk temps are fine:
Performance of processor is relatively high, higher than yours in both single and multithreaded performance.
RAM is 2400x2x64, and two memory channels is 614.4 Gbps, less than our servers but nothing to sneeze at. Maybe that DDR5 latency is biting me in the butt at those speeds.
Did another test today. The Alderlake performance had me questioning my sanity.
I put 2x Intel DC P4610s in U.2 carriers (mirrored) and installed them in a Dell R720XD with 2x E5-2667 V2s and 192GB of DDR3 1333.
And I have more IOPS in this particular test and more R/W bandwidth. I’m not sure why…lol
At Work today I had a little bit of time to do some testing on some higher end hardware. We just got a Dell R7525 with two Epyc 74F3s and 512GB of ram, and two Kioxia CM6 MU Gen4 U.2 NVME drives.
I installed SCALE under ESXI, gave it 64GB of RAM, 12 cores, and passed through the NVME drives to TrueNAS.
This benchmark is really tough on CPUs:
I received the first of my Chinese PCIE switch chip cards, the 4-port PEX 8747-based HHHL card.
For laughs, I added it to the same pool I have been testing that has the other card in it already, and I added 4 additional 970 pros.
Enter the new problem, I’ve come across a strange occurrence, where the system is freaking out saying that the PCI-E port that is the switch chip is throwing errors.
spewing errors in the console:
Output from lspci:
Some indications may indicate that the problem is not as it seems, and may just a power saving feature?
I know SCALE is using a custom bootloader built in-house for ZFS. Any idea if it’s possible to test this out?
Needless to say, performance is not so nice:
Yep - that’s likely it. I have the same with my Highpoint PEX card. Also, the additional power draw is miniscule.
Sorry about the multitude of posts.
I would recommend first doing a series of tests to validate the general throughput without using ZFS. ZFS does all kinds of things that may mask the true hardware performance capacity. E.g. you may find that your Optane log device is holding the array of nvme devices back.
Following my last post I had another series of tests that I didn’t post here in detail because I felt it would move the discussion too far away from the initial topic.
I tested max IOPs for the most nvme drives I could connect to my x570 based system. Connecting 2x Samsung 980 Pro, 2x WD 850, 2x Optane 905 , mounting each individually, I was able to achieve >5M 4k IOPS (same fio test setup as in the last posted test above). I noticed bottlenecks in a few configurations.
My recommendation: try the same with all your nvme drives so you know what your hw is capable of. Then try different zfs setups and configurations that will give you the benefits of zfs with the least performance impacts.
Disabling via middleware on TN SCALE works.
I guess our missions are different here. There are plenty of benchmarks of folks driving SSDs really fast in a straight line in Windows or on Linux with MD. I’m more interested in seeing how far I can push performance in ZFS on used enterprise ‘garbage tier’ hardware.
The fact I can crank out more IOPs in this test than on a brand new production ready server (albeit with more drives, and I’m cheating with PEX chips) is pretty neat.
After applying the fix for the power management thing, I am now getting over 40k IOPs, still with one card being based on the PEX 8632 PCI-E gen 2.0 switch chip. I’m interested to see if it’ll scale a little higher if I replaced that card.
This is still way faster than I could push with SATA drives in the same test.
Hmmm
Still doing some testing. Got my Caecent adapter, which is the PLX8724-based card. I guess the over-driving of Lanes on the card performs worse on average than an older PCIE generation 8632 switch chip.
Performance is actually less than the PCIE GEN 2.0-based card.
I don’t recommend this product for that reason:
Over the past year or so I have been obsessively exploring various aspects of ZFS performance, from large SATA arrays with multiple HBA cards, to testing NVME performance. In my previous testing I was leveraging castoff enterprise servers that were Westmere, Sandy Bridge and Ivy Bridge based platforms. There was some interesting performance variations between those platforms and I was determined to see what a more modern platform would do. It seemed that most of my testing indicated that ZFS was being bottlenecked by the platform it was running on, with high CPU usage being present during testing.
I recently picked up an AMD EPYC 7282, Supermicro H12SSL-I, and 256GB of DDR4-2133 RAM. While the RAM is certainly not the fastest, I now have a lot of PCIE lanes to play with and I don’t have to worry as much about which slot goes to which CPUs.
For today’s adventure, I tested 4 and 8 Samsung 9A1 512GB SSDs (PCIE Gen4), in 2 PLX PEX8747 (PCIE Gen3) Linkreal Quad M.2 adapter as well as 2 Bifurcation-based Linkreal Quad M.2 adapters in that new platform. My goal was to determine the performance differances between relying on motherboard bifurcation vs a PLX chip. I also wanted to test the performance impacts of compression and deduplication on NVME drives in both configurations. Testing was done using FIO, in a mixed read-and-write workload
fio --bs=128k --direct=1 --directory=/mnt/newprod/ --gtod_reduce=1 --ioengine=posixaio --iodepth=32 --group_reporting --name=randrw --numjobs=16 --ramp_time=10 --runtime=60 --rw=randrw --size=256M --time_based
I hope this helps some folks.
The first set of tests was done on single card with (4) 9A1s in a 2 VDEV mirrored configuration on each of the different cards.
Test Setup | Read BW Min | Read BW max | Read BW Ave | Read BW Std Deviation | Read IOPs Min | Read IOPs max | Read IOPs Ave | Read IOPs Std Deviation | Write BW Min | Write BW max | Write BW Ave | Write BW Std Deviation | Write IOPs Min | Write IOPs max | Write IOPs Ave | Write IOPs Std Deviation |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bifurcation 4x9A1 2xMirrors with Dedupe and Compression | 142 | 7679 | 1163 | 103.88 | 1140 | 61437 | 9305 | 830.99 | 176 | 7642 | 1160 | 103.45 | 1414 | 61139 | 9282 | 827.56 |
PLX 4x9A1 2xMirrors with Dedupe and Compression | 87 | 10065 | 1110 | 116.32 | 698 | 80264 | 8885 | 930.54 | 116 | 10064 | 1109 | 116.32 | 928 | 80264 | 8874 | 930.65 |
Bifurcation 4x9A1 2xMirrors with Compression No Dedupe | 952 | 1967 | 1334 | 13.6 | 7621 | 15739 | 10655 | 95.55 | 1043 | 1931 | 1332 | 11.95 | 8346 | 15452 | 10655 | 95.55 |
PLX 4x9A1 2xMirrors with Compression No Dedupe | 693 | 2031 | 1114 | 14.38 | 5548 | 16252 | 8918 | 115.05 | 777 | 2033 | 1112 | 13.29 | 6216 | 16264 | 8898 | 106.33 |
Bifurcation 4x9A1 2xMirrors No Compression No Dedupe | 835 | 2471 | 1578 | 21.13 | 6686 | 6686 | 19770 | 168.97 | 857 | 2387 | 1579 | 20.02 | 6856 | 19098 | 12632 | 160.15 |
PLX 4x9A1 2xMirrors No Compression No Dedupe | 692 | 1654 | 1091 | 13.01 | 5542 | 13232 | 8734 | 104.04 | 764 | 1574 | 1089 | 11.58 | 6114 | 12598 | 8716 | 92.66 |
The second set of tests was done with two matching cards with (8) 9A1s in a 4 VDEV mirrored configuration. The mirrors span between the cards, so if one entire card were to fail, the pool would remain in tact.
Test Setup | Read BW Min | Read BW max | Read BW Ave | Read BW Std Deviation | Read IOPs Min | Read IOPs max | Read IOPs Ave | Read IOPs Std Deviation | Write BW Min | Write BW max | Write BW Ave | Write BW Std Deviation | Write IOPs Min | Write IOPs max | Write IOPs Ave | Write IOPs Std Deviation |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bifurcation 8x9A1 4xMirrors Cross Cards with Dedupe and Compression | 131 | 7641 | 1207 | 106.91 | 1055 | 61131 | 9658 | 855.27 | 171 | 7661 | 1204 | 106.68 | 1372 | 61294 | 9636 | 853.4 |
PLX 8x9A1 4xMirrors Cross Cards with Dedupe and Compression | 285 | 6266 | 1273 | 89.59 | 2910 | 50290 | 10169 | 713.47 | 363 | 6286 | 1271 | 89.19 | 2910 | 50290 | 10169 | 89.19 |
Bifurcation 8x9A1 4xMirrors Cross Cards with Compression no Dedupe | 1063 | 2092 | 1496 | 14.9 | 8506 | 16743 | 11968 | 108.81 | 1187 | 1979 | 1494 | 13.6 | 9500 | 15834 | 11959 | 108.81 |
PLX 8x9A1 4xMirrors Cross Cards with Compression no Dedupe | 1074 | 2152 | 1519 | 14.8 | 8594 | 17217 | 12155 | 118.4 | 1241 | 2009 | 1518 | 13.29 | 9930 | 16075 | 12147 | 106.31 |
Bifurcation 8x9A1 4xMirrors Cross Cards no Compression no Dedupe | 1664 | 3476 | 2412 | 22.98 | 13316 | 27809 | 19298 | 183.85 | 1741 | 3384 | 2415 | 172.6 | 13926 | 27077 | 19323 | 172.6 |
PLX 8x9A1 4xMirrors Cross Cards no Compression no Dedupe | 2010 | 3718 | 2811 | 23.32 | 16082 | 29747 | 22490 | 186.51 | 2073 | 3594 | 2815 | 21.73 | 16588 | 28758 | 22524 | 173.81 |
Some Bar Graphs:
Some interesting conclusions to be drawn The narrower 4 disk pools seem to perform better with the bifurcation based solution, which is likely due to the fact these are PCIE Gen 4 drives. However, as we get wider, the overhead of relying on the mainboard to do the switching seems to grow and the PLX chip solution seems to deliver better performance.
In general, this is a very interesting test series. Thanks for making all the effort and sharing the results.
Unfortunately, the only conclusion that I see is that zfs is severely bottlenecking NVMe drives. Almost all test results of 4x and 8x show lower performance than what a single drive is capable of (it would be interesting to see the results for a single drive with zfs and ext4 or xfs for context).
Yes, zfs offers many benefits, but the main conclusion for me is to avoid zfs with nvme storage devices until the bottlenecks are fixed. At that point I would be more interested in these tests.
For you, my friend.
Using my workstation (Epyc 7302, 128GB DDR4 3200) on Windows 11, setup with storage spaces (default settings) in a Three way mirror configuration (not exactly apples-to-apples) this is what I get:
So yes, ZFS does present some overhead here in this test, but it’s not that large. This is an extremely stressful test designed to tease out the worst possible case scenario.
Further comparisons yield other additional information. Two Intel P4610 1.6TB drives in a mirrored configuration in Storage Spaces performs equally as poorly. I also threw in my boot drive for the lulz.
Here is the updated full chart
Test Setup | Read BW Min | Read BW max | Read BW Ave | Read BW Std Deviation | Read IOPs Min | Read IOPs max | Read IOPs Ave | Read IOPs Std Deviation | Write BW Min | Write BW max | Write BW Ave | Write BW Std Deviation | Write IOPs Min | Write IOPs max | Write IOPs Ave | Write IOPs Std Deviation |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Bifurcation 4x9A1 2xMirrors with Dedupe and Compression | 142 | 7679 | 1163 | 103.88 | 1140 | 61437 | 9305 | 830.99 | 176 | 7642 | 1160 | 103.45 | 1414 | 61139 | 9282 | 827.56 |
PLX 4x9A1 2xMirrors with Dedupe and Compression | 87 | 10065 | 1110 | 116.32 | 698 | 80264 | 8885 | 930.54 | 116 | 10064 | 1109 | 116.32 | 928 | 80264 | 8874 | 930.65 |
Bifurcation 4x9A1 2xMirrors with Compression No Dedupe | 952 | 1967 | 1334 | 13.6 | 7621 | 15739 | 10655 | 95.55 | 1043 | 1931 | 1332 | 11.95 | 8346 | 15452 | 10655 | 95.55 |
PLX 4x9A1 2xMirrors with Compression No Dedupe | 693 | 2031 | 1114 | 14.38 | 5548 | 16252 | 8918 | 115.05 | 777 | 2033 | 1112 | 13.29 | 6216 | 16264 | 8898 | 106.33 |
Bifurcation 4x9A1 2xMirrors No Compression No Dedupe | 835 | 2471 | 1578 | 21.13 | 6686 | 6686 | 19770 | 168.97 | 857 | 2387 | 1579 | 20.02 | 6856 | 19098 | 12632 | 160.15 |
PLX 4x9A1 2xMirrors No Compression No Dedupe | 692 | 1654 | 1091 | 13.01 | 5542 | 13232 | 8734 | 104.04 | 764 | 1574 | 1089 | 11.58 | 6114 | 12598 | 8716 | 92.66 |
Bifurcation 8x9A1 4xMirrors Cross Cards with Dedupe and Compression | 131 | 7641 | 1207 | 106.91 | 1055 | 61131 | 9658 | 855.27 | 171 | 7661 | 1204 | 106.68 | 1372 | 61294 | 9636 | 853.4 |
PLX 8x9A1 4xMirrors Cross Cards with Dedupe and Compression | 285 | 6266 | 1273 | 89.59 | 2910 | 50290 | 10169 | 713.47 | 363 | 6286 | 1271 | 89.19 | 2910 | 50290 | 10169 | 89.19 |
Bifurcation 8x9A1 4xMirrors Cross Cards with Compression no Dedupe | 1063 | 2092 | 1496 | 14.9 | 8506 | 16743 | 11968 | 108.81 | 1187 | 1979 | 1494 | 13.6 | 9500 | 15834 | 11959 | 108.81 |
PLX 8x9A1 4xMirrors Cross Cards with Compression no Dedupe | 1074 | 2152 | 1519 | 14.8 | 8594 | 17217 | 12155 | 118.4 | 1241 | 2009 | 1518 | 13.29 | 9930 | 16075 | 12147 | 106.31 |
Bifurcation 8x9A1 4xMirrors Cross Cards no Compression no Dedupe | 1664 | 3476 | 2412 | 22.98 | 13316 | 27809 | 19298 | 183.85 | 1741 | 3384 | 2415 | 172.6 | 13926 | 27077 | 19323 | 172.6 |
PLX 8x9A1 4xMirrors Cross Cards no Compression no Dedupe | 2010 | 3718 | 2811 | 23.32 | 16082 | 29747 | 22490 | 186.51 | 2073 | 3594 | 2815 | 21.73 | 16588 | 28758 | 22524 | 173.81 |
Bifurcation 8x9A1 Windows Three-way Mirror Storage Spaces | 3103 | 3558 | 3336.07 | 5.53 | 24533 | 28501 | 26713 | 46.87 | 3067 | 3564 | 3339 | 5.86 | 24533 | 28501 | 26713 | 46.87 |
U.2 direct 2xIntel P4610 Mirror Windows Storage Spaces | 1292 | 1700 | 1498 | 5.06 | 10333 | 13592 | 11982 | 40.43 | 1296 | 1686 | 1497 | 4.79 | 10369 | 13488 | 11976 | 38.33 |
Single Samsung 970 Pro 1TB Standard NTFS | 66 | 920 | 486 | 103.6 | 508 | 7186 | 3969 | 80.94 | 57 | 908 | 509 | 102.91 | 438 | 7091 | 3969 | 80.94 |
This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.