I’m hoping someone can give me some pointers on this, I’m totally at a loss.
I’ve got a few servers I’m trying to troubleshoot what I perceive to be disk performance issues on.
Xeon v4 server with loads of RAM and Samsung F320’s in RAID-0 (debian bookworm)
Dual Epyc with 1TB RAM and a mix of Samsung PM9A3/Optane (debian bookworm)
single epyc with 256GB RAM and Optane (Ubuntu 22.04)
Debian kernel version: 6.9.7-1~bpo12+1
Ubuntu kernel version: 6.8.0-40-generic
They all exhibit this behavior. A single-file dd copy with bs=1M will max out at ~1.5-2GB/s even on my PCIE Gen 4 Epyc server with 1TB RAM.
***TO BE CLEAR - fio benchmarks indicate that all NVME drives perform within spec but in real-world throughput testing they are scarcely better than 12G SAS drives (which makes me question fio as a benchmarking tool - if you can’t achieve the stated spec in a simple file copy then what good is the benchmark other than to make the manufacturers look good?)
To test if this was an issue of multi-threading being required to achieve max throughput on the dual 64-core Epyc server (1TB RAM) I have MSSQL installed in a docker container.
The dual 64-core NVME setup is as follows:
PM9A3 (EXT-4) on u.2 to PCI-E adapters housing MDF files
Optane p4800x (EXT-4) also on u.2 to PCI-E adapter housing LDF files
To test multi-threaded throughput I did a SELECT INTO of all of the tables in one DB and into the other concurrently using as many threads as there are tables (I think ~20 tables). Both source/sink databases are set to simple logging. The source database is 110GB in size
The entire 110GB source database takes about 2-2.5 minutes to copy to the new database (on the same drive) which indicates about ~1GB/s total throughput. This is abysmal. Even running as a single transaction yields similar results.
Breaking the destination database into multiple files yields the same results. This does not appear to be a multi-threading issue.
I’ve done numerous tests on other servers using cp, dd, rsync etc. No real change in throughput.
What makes things interesting is that I took two Samsung PM863 SATA drives and put them in a striped ZFS pool and achieved proper scaling on a per-drive basis for file copies… it behaves as I would expect (i.e. two drive stripe = 1GB+/s throughput on copies).
I’m starting to think that this is a Linux issue generally and that Linux is severely under-optimized for NVME performance.
Does anyone else have experiences like this? Please tell me I’m missing something I really don’t want to have to switch out my drives ![]()