Slow Performance with Corsair MP600 PRO NH PCIe-4

Hi All,

I am running a Linux server with the following specs:

NVMe Drives: 2x Corsair MP600 PRO NH 8 TB PCIe4 NVMe in Software (mdadm) RAID-1
Motherboard: AsRockRack B650D4U
CPU: AMD Ryzen 7950X

I notice a significant slowdown, regardless of disk measuring tool (i.e. I tried both dd and fio) after the server has been powered on for a few hours. If I reboot the server, the disk I/O results will be up to the 2-3 GB/s range, but if i try again a few hours later, its down to the 300MB/s range.

To illustrate, here’s what its suppose to look like:

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/md126):

Block Size | 4k (IOPS) | 64k (IOPS)
------ | — ---- | ---- ----
Read | 903.14 MB/s (225.7k) | 2.44 GB/s (38.2k)
Write | 905.52 MB/s (226.3k) | 2.45 GB/s (38.4k)
Total | 1.80 GB/s (452.1k) | 4.90 GB/s (76.6k)
| |
Block Size | 512k (IOPS) | 1m (IOPS)
------ | — ---- | ---- ----
Read | 3.08 GB/s (6.0k) | 3.14 GB/s (3.0k)
Write | 3.24 GB/s (6.3k) | 3.35 GB/s (3.2k)
Total | 6.33 GB/s (12.3k) | 6.49 GB/s (6.3k)

And here’s what it looks like after the server has been powered on for more than a few hours:

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/md126):

Block Size | 4k (IOPS) | 64k (IOPS)
------ | — ---- | ---- ----
Read | 85.42 MB/s (21.3k) | 328.25 MB/s (5.1k)
Write | 85.65 MB/s (21.4k) | 329.98 MB/s (5.1k)
Total | 171.07 MB/s (42.7k) | 658.23 MB/s (10.2k)
| |
Block Size | 512k (IOPS) | 1m (IOPS)
------ | — ---- | ---- ----
Read | 412.86 MB/s (806) | 414.53 MB/s (404)
Write | 434.80 MB/s (849) | 442.14 MB/s (431)
Total | 847.67 MB/s (1.6k) | 856.67 MB/s (835)

At first, I may have thought, perhaps its an issue once the NVMe’s start getting more full, because in the beginning stages when I first deployed this server, the I/O tests were consistently good. However, some of the other servers we have with the exact same build, are 80-90% at capacity and still have decent disk I/O performance, so I’m not certain that might be the case here. Also, I noticed the issue occurs on this server when the NVMe is only at 30% usage already so I don’t think its an issue with how much storage capacity is being used.

iotop show less than 50-100MB/s usage at any given time. CPU usage is low.

Total DISK READ : 768.05 K/s | Total DISK WRITE : 11.91 M/s
Actual DISK READ: 768.05 K/s | Actual DISK WRITE: 12.09 M/s

Firmware looks to be up to date according to nvme list:

[root@server ~]# nvme list
Node SN Model Namespace Usage Format FW Rev


/dev/nvme0n1 A5LIB340001QRC Corsair MP600 PRO NH 1 8.00 TB / 8.00 TB 512 B + 0 B EIFM51.3
/dev/nvme1n1 A5LIB340001PT7 Corsair MP600 PRO NH 1 8.00 TB / 8.00 TB 512 B + 0 B EIFM51.3

Here are the temperature reading/smart log data:

[root@server ~]# nvme smart-log /dev/nvme0n1
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning : 0
temperature : 63 C (336 Kelvin)
available_spare : 100%
available_spare_threshold : 5%
percentage_used : 0%
endurance group critical warning summary: 0
data_units_read : 115,067,351
data_units_written : 36,177,997
host_read_commands : 925,472,295
host_write_commands : 831,088,874
controller_busy_time : 2,688
power_cycles : 3
power_on_hours : 1,185
unsafe_shutdowns : 1
media_errors : 0
num_err_log_entries : 4
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0
[root@server ~]# nvme smart-log /dev/nvme1n1
Smart Log for NVME device:nvme1n1 namespace-id:ffffffff
critical_warning : 0
temperature : 61 C (334 Kelvin)
available_spare : 100%
available_spare_threshold : 5%
percentage_used : 0%
endurance group critical warning summary: 0
data_units_read : 137,448,840
data_units_written : 20,550,570
host_read_commands : 1,113,104,387
host_write_commands : 810,468,119
controller_busy_time : 2,616
power_cycles : 3
power_on_hours : 1,185
unsafe_shutdowns : 1
media_errors : 0
num_err_log_entries : 4
Warning Temperature Time : 0
Critical Composite Temperature Time : 0
Thermal Management T1 Trans Count : 0
Thermal Management T2 Trans Count : 0
Thermal Management T1 Total Time : 0
Thermal Management T2 Total Time : 0
[root@server ~]# dmesg | grep nvme
[ 1.234403] nvme nvme0: pci function 0000:0c:00.0
[ 1.234415] nvme nvme1: pci function 0000:09:00.0
[ 1.257469] nvme nvme1: Shutdown timeout set to 10 seconds
[ 1.260143] nvme nvme0: Shutdown timeout set to 10 seconds
[ 1.535500] nvme nvme1: 32/0/0 default/read/poll queues
[ 1.538258] nvme1n1: p1 p2 p3 p4 p5
[ 1.584838] nvme nvme0: 32/0/0 default/read/poll queues
[ 1.587866] nvme0n1: p1 p2 p3 p4 p5
[root@server ~]# cat /proc/mdstat
Personalities : [raid1]
md123 : active raid1 nvme1n1p4[0] nvme0n1p4[1]
52160 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

md124 : active raid1 nvme1n1p5[0] nvme0n1p5[1]
7661665280 blocks super 1.2 [2/2] [UU]
bitmap: 19/58 pages [76KB], 65536KB chunk

md125 : active raid1 nvme0n1p3[1] nvme1n1p3[0]
1047552 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

md126 : active raid1 nvme0n1p1[1] nvme1n1p1[0]
83885056 blocks super 1.2 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk

md127 : active raid1 nvme0n1p2[1] nvme1n1p2[0]
67107840 blocks super 1.2 [2/2] [UU]

unused devices:

As you can see above, temperatures look fine as well for both NVMe’s - so in my mind, I don’t think it’s a temperature throttling issue (unless I’m missing something here).

Already tried updating to kernel-lt (5.x) as well as kernel-ml (6.x) - same symptoms exist.

What am I missing here? I already verified, nothing crazy in terms of resource usage (iotop and top look fine), the RAID array is not rebuilding, etc. pcie_aspm is already set to performance as well:

[root@server ~]# cat /sys/block/nvme0n1/queue/scheduler
[none] mq-deadline kyber bfq
[root@server ~]# cat /sys/module/pcie_aspm/parameters/policy
default [performance] powersave powersupersave
[root@server ~]# cat /sys/block/nvme0n1/queue/write_cache
write back
[root@server ~]#

Thanks in advance for any help or guidance here.

I have no experience with Corsair MP600 PRO, but my WD SN850’s throttle above 55C.

I have an MP600 Pro (1TB, heatsink).

Filled to roughly 1/3rd, if you give me the fio-command, I can run the test here for comparison.


Could be as simple as that.

Hi MazeFrame, thanks for the help. Here is the command:

curl -sL yabs.sh | bash -s – -ig

You can also run a simple dd test:

dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;unlink test

Let me know what you see.

1 Like

Thanks for the feedback, I shall look into some angles on reducing the heat on the NVMe’s and seeing if that helps.

Although based on Corsair’s website, the operating temperature is "
0°C to +65°C" so not sure why it would be throttling at 63C/61C respectively.

fio Disk Speed Tests (Mixed R/W 50/50) (Partition /dev/nvme0n1p2):
---------------------------------
Block Size | 4k            (IOPS) | 64k           (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 1.38 GB/s   (346.2k) | 1.86 GB/s    (29.1k)
Write      | 1.38 GB/s   (347.2k) | 1.87 GB/s    (29.2k)
Total      | 2.77 GB/s   (693.5k) | 3.73 GB/s    (58.3k)
           |                      |                     
Block Size | 512k          (IOPS) | 1m            (IOPS)
  ------   | ---            ----  | ----           ---- 
Read       | 2.51 GB/s     (4.9k) | 2.73 GB/s     (2.6k)
Write      | 2.64 GB/s     (5.1k) | 2.91 GB/s     (2.8k)
Total      | 5.16 GB/s    (10.0k) | 5.64 GB/s     (5.5k)
maze@TheFrame ~ % dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync;unlink test
16384+0 Datensätze ein
16384+0 Datensätze aus
1073741824 Bytes (1.1 GB, 1.0 GiB) kopiert, 0.477096 s, 2.3 GB/s

Hmm, yeah, that’s the expected result.

What temperature is your NVMe running at, when you check using nvme smart-log ?

What sort of heatsink are you using? (I don’t have one at the moment).

nvme smart-log /dev/nvme0n1p2

sensors has composite temp at 30°C

It is the MP600 with the heatsink and it sits in the airflow from the CPU cooler (can be seen here).

Heh yeah, you’re getting significantly lower temperatures. Will look into that, thanks :slight_smile:

Tried installing M.2 heatsinks and still facing the same symptoms. After a couple days, the server’s Disk I/O throttles again to the 500 MB/s level. Upon rebooting the server, it’s fine again and is able to achieve 2-3 GB/s on an I/O test, until a couple days later where it drops back down again. Even when temps are at 45C and 51C respectively on the NVMe.

Any other ideas here?

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.