Hey all,
I got my hands on eight 500GB Crucial MX300 SSDs for cheap.
“Great, lets try improving my free (company-wants-to-scrap-server) server with these bad boys.”
The server has the following specs:
CPU: Intel core i5 4590 (3.3GHz)
Motherboard: Gigabyte ga-h97-d3h
Memory: 22GB of DDR3 memory 1333MT/s
DELL Perc H310 flashed to IT mode
Boot disk: Samsung 980 (the villain, a cheap 1TB cache-less NVME SSD)
8x Crucial MX300 500GB SSDs; all connected to the DELL Perc H310
I am running Rock Linux 9.2 with kernel 5.14.0.
I wanted to see if my idea actually was good. Create a ZFS pool of 4 mirrored vdevs in a single pool for Postgres (the goal of this server). Surely the 8 SSDs in the optimal random 4k performance will outperform a single Samsung 980.
The NVME SSD is using XFS default settings is /etc/fstab
The zpool is configured via the following command:
Command
zpool create \
-O compression=lz4 \
-O atime=off \
-O relatime=off \
-O recordsize=8k \
-O primarycache=metadata \
-o ashift=12 \
-o autotrim=on \
-m none \
tank \
mirror /dev/disk/by-id/ata-Crucial_CT525MX300SSD1_0 /dev/disk/by-id/ata-Crucial_CT525MX300SSD1_1 \
mirror /dev/disk/by-id/ata-Crucial_CT525MX300SSD1_2 /dev/disk/by-id/ata-Crucial_CT525MX300SSD1_3 \
mirror /dev/disk/by-id/ata-Crucial_CT525MX300SSD1_4 /dev/disk/by-id/ata-Crucial_CT525MX300SSD1_5 \
mirror /dev/disk/by-id/ata-Crucial_CT525MX300SSD1_6 /dev/disk/by-id/ata-Crucial_CT525MX300SSD1_7
zfs create -o mountpoint=none tank/postgres
zfs create -o mountpoint=/postgres/data tank/postgres/data
zfs create -o mountpoint=/postgres/wal tank/postgres/wal
zpool status
pool: tank
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-Crucial_CT525MX300SSD1_174819DFFEE5 ONLINE 0 0 0
ata-Crucial_CT525MX300SSD1_174819E018E8 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-Crucial_CT525MX300SSD1_174819E026BF ONLINE 0 0 0
ata-Crucial_CT525MX300SSD1_174819E03626 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-Crucial_CT525MX300SSD1_174819E03A29 ONLINE 0 0 0
ata-Crucial_CT525MX300SSD1_174819E03D1A ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
ata-Crucial_CT525MX300SSD1_174819E03D43 ONLINE 0 0 0
ata-Crucial_CT525MX300SSD1_174819E03D96 ONLINE 0 0 0
errors: No known data errors
After a bunch of benchmarking with fio
I saw the following numbers:
fio settings | zpool read | single nvme read | zpool write | single nvme write |
---|---|---|---|---|
4k rand read fsync io-depth 1 direct | 119000 | 196000 | - | - |
4k rand r/w fsync io-depth 1 direct | 502 | 625 | 502 | 626 |
4k rand write end_fsync io-depth 1 | - | - | 9280 | 101000 |
4k rand r/w io-depth 1 direct no fsync | 1424 | 11700 | 1423 | 11600 |
WHAT?! How can 8 sata SSDs lose from a single cache-less NVME SSD???
- Is my Dell Perc H310 and IO bottleneck?
- I guess not because the first benchmark shows we can push almost 12K iops.
- Is ZFS badly configured or ZFS holding me back for some reason?
- I don’t know, 4k random IO is poor on ZFS?
- Is record size 8k really that bad?
- Am I benchmarking wrong?
- Maybe, I copied some
fio
commands from the web
- Maybe, I copied some
- Are sata SSDs really that slow?
- I think it is not that bad… right?
Somewhere next week I will upgrade this system with an Intel xeon E3-1275 V6 with 32GB of memory. I will repeat the same tests again and see if changes something.
TL;DR:
8 sata SSDs in 4 mirror zfs pool should be faster then single cheap nvme SSD right? Benchmark shows NVME wins. Why?
In the mean time I hope that somebody can share their experiences with 8 SSDs in a zpool quad-mirror. Or maybe you see my mistake! Which I currently do not see XD