Samsung 990 PRO running at less than half of rated speed?

For my recent Threadripper build, I added two 990 Pro 2TB. Today I finally got around to benchmarking them, and something is clearly wrong.

Block read: ~2.7GB/s
Filesystem (BTRFS RAID1) read: ~1.7GB/S
Filesystem (BTRFS RAID1) write: ~3.8GB/S

I immediately checked lspci -vvv and confirmed that the drives were connected at pcie gen 4 x4.

The SSD’s were not used as root or home, and were only used for testing.

What else could be causing such a weird slowdown? I was expecting numbers greater than 7GB/s for the reads.

Heat maybe?

Personally I’d return them for something else and not pay the Samsung tax.

Good thought, but no. They seem to peak at ~45-50C

How full are the drives?

Have you checked the installed firmware against the bug-fixes of other firmwares?

1 Like

How are you testing?

1 Like

How full? ~30%
Firmware? Yes, booting the firmware update ISO from samsung was the first thing I did on these drives
Testing?
For block reads, I used dd and hdparam -tT (they both agree with each other once I drop caches)
For filesystem read/writes, I used kdiskmark (uses fio)

You are not first to notice , have a looksee here How to copy files in Linux on nvme SSD?.

Try running kdiskmark for reference, then fio.

Fore reference this is single 990 PRO 2 TB (fedora 39, btrfs?)

Also interesting observation from one more methodical user:
https://www.reddit.com/r/btrfs/comments/r6i119/btrfs_read_performance_issues/

Try to use multi-threaded implementations of cp utility. Eg. this one:

1.7G is exactly the same limit I’m hitting on both 990 pro and FireCuda 520 when using standard cp while fio benchmark gives me like 13G/s or smth.

Does BTRFS support TRIM?
If so, try enable it.
If not, overprovision the drives.

@greatnull Huh, interesting. Yeah, I know real world use will be lower. The fact that synthetics are bad too is what’s really throwing me off, here.

@lapsio: I’m not using cp, I’m using fio/kdiskmark wich is showing me 1/10th of your fio benchmark

@jxdking Yes, btrfs supports trim, and the filesystem is mounted with the “ssd” option, and dmesg shows:
[ 10.218408] BTRFS info (device dm-2): auto enabling async discard
[ 11.522325] BTRFS info (device dm-5): auto enabling async discard

Synthetics are correct though, they are much closer to real world values that default high queue depth and multilayered workloads.

Q1T1 is still king of desktop use case. I understand why samsung marketing lies by omission, but reviewer should be smarter.

They have more than decade of knowhow available and yet they publish default and imprecise benchmarkj values.

Kdiskmark, like crystaldiskmark set default measurement like this, you have to select realworld profile to measure data I posted above.

Nice values, but for 8x and 32x larger QD respectively.

Hmm… my synthetics look nothing like that:

(I was testing with a 2GiB block size, but not much difference with 1GiB or 5GiB)

So I just tried this, Ubuntu 22.04, 6.5 HWE Kernel, also a 2TB Samsung 990 ext4 file system, this is what I get. I selected real world performance and NVME SSD settings.

and real world mix this is what I get:

What’s this Samsung tax you’re referring to?

Just to be clearer, this my original posted data, just two measurements wit hdifferent setting sideto side. Important values are higlighted:

  • Drive is Samsung 990 PRO 2 TB
  • OS is Fedora 39 with default BTRFS FS (probable impact)
  • Benches are made by Kdiskmark, number of runs and dataset size are left default (will not have impact on read part of testing anyway)

If you compare your results, make sure that:

  • you are comparing same subtests
  • are on similar FS
  • drive is not overfull (should affect writes, not reads)

I know it’s not a nvme drive just a SSD one I have but when using the Microsoft driver compared to the motherboard driver…these both produced vastly different results.

Might be worth a try

I think with btrfs, if you have lots of snapshots with lots of files across lots of blocks, that can dramatically impact write performance, even with the drive mostly half empty.
btrfs isn’t a good filesystem to benchmark anyway because, while it performs well in the real-world use cases people actually see most of the time, the performance can be highly variable, especially in benchmark tests. Also has a lot of overhead and doesn’t really show drive performance well, even on a ramdisk or such.

XFS and BTRFS numbers compared

all on the same drive
xfs
image
btrfs
image
btrfs again
image
and both these btrfs numbers are much higher than a previous result, showing closer to 70 on writes and ~20k iops.

BTRFS writes should be taken with a grain of salt, as they vary by the alignment of the stars.

Samsung are quite bad due to the tech they implement and many problem… that’s also the reason many datacenter had shift over other brand.
As for speed on consumer… you are only testing the cache of the drive. The actual nand chip is the same everywhere, so you have a simple usb3.1 drive speed.
If you want true high speed go with adata legend 960m that provide the highest speed. Or best go with solidigm that will give the top speed. But this is entreprise and the price money will be quickly recoup after you throw maybe 2 samsung to the garbage … the entr will be full up.

@LinuxNoob1 Good to see your writes are similar to mine, though it is on ext4.

@greatnull

you are comparing same subtests

Yes, yours are at least 2x mine of the shared subtests, except for rnd4k q1t1, which are close enough

are on similar FS

new BTRFS

drive is not overfull (should affect writes, not reads)

I don’t think 30% is too full?

@Necrosaro I’m not sure there is a microsoft driver for linux :slight_smile:

@alkafrazin Interesting! Yeah, again it seems like read is suffering more than write, but this FS does have 3 subvolumes. That was one reason I started looking at dd/hdparm block reads, but the slow reads there too was why I made this post