Benchmarking Storage & SSDs 2021 Template

wendell · April 28, 2021, 12:35am

This is a DRAFT and a template for future storage reviews. I’m thinking I’d like to organize my thoughts into a sort of checklist I can run through for each SSD review. Some of it I want to totally automate

What product is being tested?

# output of nvme list
/dev/nvme1n1     10serialnum8L         KBG40ZNS1T02 KIOXIA MEMORY              1           1.02  TB /   1.02  TB    512   B +  0 B   AEGA0101

… tells us it’s 512 byte formatted (may have alignment issues, can be formatted 4k natively, etc)

Does this device require any kind of warm-up?

*Some Samsung drives need a partition table to perform correctly, probably because they snoop a bit on filesystem and contents. Some may need to fill and clear the drive a few times. Others may need to *

Read Only testing

Tested on Linux _____
with Kernel version ________

Test with 1-4 CPU threads (necessary depending on CPU clockspeed)
iodepth 32 may not make sense for all devices.

# Use FIO to test 220g or up to a half an hour and output the stats. 
fio --readonly --name=tempstorage \
    --filename=$2 \
    --filesize=220g --rw=randread --bs=$3 --direct=1 --overwrite=0 \
    --numjobs=$1 --iodepth=32 --time_based=1 --runtime=1800 \
    --ioengine=io_uring \
    --registerfiles --fixedbufs --hipri \
    --gtod_reduce=1 --group_reporting

Mixed Read and Write Testing

Numbers will be worse fio command TODO

Queue Depth 1 Testing

# Use Fio again but with more params and an iodepth of just one.
# from intel recommendations for optane
[global]
name= OptaneFirstTest
ioengine=io_uring
hipri
direct=1
size=100%
randrepeat=0
time_based
ramp_time=0
norandommap
refill_buffers
log_avg_msec=1000
log_max_value=1
group_reporting
percentile_list=1.0:25.0:50.0:75.0:90.0:99.0:99.9:99.99:99.999:99.9999:99.99999:99.999999:100.0
filename=$2


[rd_rnd_qd_1_4k_1w]
bs=4k
iodepth=1
numjobs=1
rw=randread
runtime=300
write_bw_log=bw_rd_rnd_qd_1_4k_1w
write_iops_log=iops_rd_rnd_qd_1_4k_1w
write_lat_log=lat_rd_rnd_qd_1_4k_1w

About the test platform

CPU
Motherboard
Ram Configuration & Clockspeed
Measured Ram Latency

Re-Create Crystal Diskmark numbers, but in linux via automation

this assumes there is a filesystem and argument $1 is the path to which it will write (need to add test -d imho if $1 is specified … )

# From  https://gist.github.com/i3v/99f8ef6c757a5b8e9046b8a47f3a9d5b


#!/bin/bash

# This script is based on https://unix.stackexchange.com/revisions/480191/9 .
# The following changes proved to be necessary to make it work on CentOS 7:
#  * removed disk info (model, size) - not very useful, might not work in many cases.
#  * using "bw" instead of "bw_bytes" to support fio version 3.1 (those availible through yum @base)
#  * escaping exclamation mark in sed command
#  * the ".fiomark.txt" is not auto-removed 

LOOPS=5 #How many times to run each test
#SIZE=1024 #Size of each test, multiples of 32 recommended for Q32 tests to give the most accurate results.
SIZE=128 #Size of each test, multiples of 32 recommended for Q32 tests to give the most accurate results.
WRITEZERO=0 #Set whether to write zeroes or randoms to testfile (random is the default for both fio and crystaldiskmark); dd benchmarks typically only write zeroes which is why there can be a speed difference.

QSIZE=$(($SIZE / 32)) #Size of Q32Seq tests
SIZE+=m
QSIZE+=m

if [ -z $1 ]; then
    TARGET=$HOME
    echo "Defaulting to $TARGET for testing"
else
    TARGET="$1"
    echo "Testing in $TARGET"
fi



echo "Configuration: Size:$SIZE Loops:$LOOPS Write Only Zeroes:$WRITEZERO
Running Benchmark,  please wait...
"

fio --loops=$LOOPS --size=$SIZE --filename="$TARGET/.fiomark.tmp" --stonewall --ioengine=libaio --direct=1 --zero_buffers=$WRITEZERO --output-format=json \
  --name=Bufread --loops=1 --bs=$SIZE --iodepth=1 --numjobs=1 --rw=readwrite \
  --name=Seqread --bs=$SIZE --iodepth=1 --numjobs=1 --rw=read \
  --name=Seqwrite --bs=$SIZE --iodepth=1 --numjobs=1 --rw=write \
  --name=512kread --bs=512k --iodepth=1 --numjobs=1 --rw=read \
  --name=512kwrite --bs=512k --iodepth=1 --numjobs=1 --rw=write \
  --name=SeqQ32T1read --bs=$QSIZE --iodepth=32 --numjobs=1 --rw=read \
  --name=SeqQ32T1write --bs=$QSIZE --iodepth=32 --numjobs=1 --rw=write \
  --name=4kread --bs=4k --iodepth=1 --numjobs=1 --rw=randread \
  --name=4kwrite --bs=4k --iodepth=1 --numjobs=1 --rw=randwrite \
  --name=4kQ32T1read --bs=4k --iodepth=32 --numjobs=1 --rw=randread \
  --name=4kQ32T1write --bs=4k --iodepth=32 --numjobs=1 --rw=randwrite \
  --name=4kQ8T8read --bs=4k --iodepth=8 --numjobs=8 --rw=randread \
  --name=4kQ8T8write --bs=4k --iodepth=8 --numjobs=8 --rw=randwrite > "$TARGET/.fiomark.txt"

QUERY='def read_bw(name): [.jobs[] | select(.jobname==name+"read").read.bw] | add / 1024 | floor;
       def read_iops(name): [.jobs[] | select(.jobname==name+"read").read.iops] | add | floor;
       def write_bw(name): [.jobs[] | select(.jobname==name+"write").write.bw] | add / 1024 | floor;
       def write_iops(name): [.jobs[] | select(.jobname==name+"write").write.iops] | add | floor;
       def job_summary(name): read_bw(name), read_iops(name), write_bw(name), write_iops(name);
       job_summary("Seq"), job_summary("512k"), job_summary("SeqQ32T1"),
       job_summary("4k"), job_summary("4kQ32T1"), job_summary("4kQ8T8")'
read -d '\n' -ra V <<< "$(jq "$QUERY" "$TARGET/.fiomark.txt")"

echo -e "
Results:  
\033[0;33m
Sequential Read: ${V[0]}MB/s IOPS=${V[1]}
Sequential Write: ${V[2]}MB/s IOPS=${V[3]}
\033[0;32m
512KB Read: ${V[4]}MB/s IOPS=${V[5]}
512KB Write: ${V[6]}MB/s IOPS=${V[7]}
\033[1;36m
Sequential Q32T1 Read: ${V[8]}MB/s IOPS=${V[9]}
Sequential Q32T1 Write: ${V[10]}MB/s IOPS=${V[11]}
\033[0;36m
4KB Read: ${V[12]}MB/s IOPS=${V[13]}
4KB Write: ${V[14]}MB/s IOPS=${V[15]}
\033[1;33m
4KB Q32T1 Read: ${V[16]}MB/s IOPS=${V[17]}
4KB Q32T1 Write: ${V[18]}MB/s IOPS=${V[19]}
\033[1;35m
4KB Q8T8 Read: ${V[20]}MB/s IOPS=${V[21]}
4KB Q8T8 Write: ${V[22]}MB/s IOPS=${V[23]}
"

# rm "$TARGET/.fiomark.txt"
rm "$TARGET/.fiomark.tmp"

How fast is main memory?

Note: this is not directly comparable to tools like AIDA64…

 sysbench --test=memory --memory-block-size=8M --memory-total-size=400G --num-threads=32

TODO continued

Draft Status *

Log · April 29, 2021, 8:52pm

One very interesting reference I’d like to throw out there is that for pure latency purposes (like for SLOGS), this truenas thread has about every drive you can think of somewhere in it benchmarked with diskinfo -wS /dev/XXX

The fact that they are all done the same makes for good comparisons.

wendell · April 29, 2021, 9:08pm

got a linux equivalent with same output? or is that bash script with nvme cli and hdparm pasted together?

Log · April 29, 2021, 10:02pm

No yet unfortunately. At some point when I have free time I’m going to set up multi boot on my spare system and see if I can find an equivalent hdparm command. It’s aphid season so I’m going to be working weekends for a bit.

Another thing I’m also curious about is the impact of performance disk location, numa nodes/ccx’s and ram in relation to each other, which my old threadripper will be perfect for.
This video (about 6 minutes in) indicates that data can move around in completely silly ways if thought isn’t given to it.

aBav.Normie-Pleb · April 30, 2021, 12:06am

@wendell

Could you add something that logs if PCIe errors happen during the tests (with NVMe SSDs)?

In theory these should reduce performance numbers (even though my Samsung PM1733 SSDs still perform slightly above their specifications even if the WHEA 17 errors I molested you with occur)

wendell · April 30, 2021, 1:31am

I saw 6 PCIe errors, correcteds, during testing (~8 hours of pounding the drives, but I haven’t posted that yet). I think if you aren’t getting dozens of errors per second it is ok.

aBav.Normie-Pleb · April 30, 2021, 2:18am

It would also be nice to create a dataset of hardware configurations - maybe if enough users post their results including PCIe errors you could find patterns which components handle PCIe (Gen4) quite well and which could maybe cause issues in the future.