Intel P4500/P4510 4 TB NVMe U.2 SSD - Abyssmal sequential results with Broadcom HBA 9400 and low read with AMD NVMe RAID0 - But Optane works fine?!

Hi,

I got two of the Intel P4500 4 TB U.2 SSDs that have been recommended by @wendell in a few videos for potentially being a great deal when finding them used.

Both units have the most recent firmware installed (cp. Intel Memory And Storage Tool 1.0.5 and work fine with full expected performance if you’re connecting them via PCIe 3.0 x4 from the X570 chipset so I doubt I’ve purchased defective lemons.

On the other hand I got two Broadcom HBA 9400 8i8e controllers with the latest firmware and drivers installed (P14 package).

One of the HBAs is connected with PCIe 3.0 x8 via CPU, one with PCIe 3.0 x8 via the motherboard’s X570 chipset that is connected to a Zen 2 CPU (3900 PRO) via a PCIe 4.0 x4 link.

Nothing else is using the system (test setup), Windows 10 1909 was re-installed a few days ago on an erased SATA SSD, only using the lastest drivers and is up to Windows Update patch level 2020-04.

Broadcom HBA 9400 HBAs can directly connect to two NVMe SSDs on their internal SAS HD ports with a PCIe 3.0 x4 link to each drive.

This works okay with a couple of Intel Optane drives (tested 905P 480 GB and DC P4801X 100 GB, a bit slower than directly connected to the motherboard, but not that bad), but when connecting a P4500 the sequential speeds are abyssmal (via HBA 400-600 MB/s read, 1200-1300 MB/s write; directly via motherboard: ca. 3300 MB/s read, ca. 2000 MB/s write).

I’ve done various CrystalDiskMark tests and A-B tested the HBAs, U.2 cables and SSDs.

I doubt that there is a hardware defect since…

  • Both HBAs and P4500 SSDs show similar results

  • It doesn’t matter if a HBA is connected via CPU or chipset PCIe lanes (as expected, since the ASUS Pro WS X570-Ace motherboard can connect an AIC to the chipset PCIe slot with a x8 link)

  • There isn’t a bandwith bottleneck since I’m only connecting two NVMe SSDs to each HBA, not using additional drives via the external SAS connectors.

  • All SSDs erased, initialized with GPT, a single large partition with NTFS/4 kBytes per sector

  • Open-air test bench setup, the HBAs are cooled with a 140 mm Noctua fan (reported temperature around 50 °C), same goes for the SSDs (reported temperature below 40 °C), so there is no thermal throttling.

  • Latest BIOS 1302 is installed on the ASUS Pro WS X570-Ace

  • The CPU+Motherboard+RAM (128 GB ECC DDR4-2666) platform performs as expected when looking at CPU speed.

Here are the benchmark results (only listed the results with the HBA directly connected to CPU PCIe lanes for potentially the best results the HBA can offer, the results for the HBA in the X570 chipset slot are very similar):

1) P4500 4 TB results with X570 chipset:

[Read]
Sequential 1MiB (Q= 8, T= 1): 3371.715 MB/s [ 3215.5 IOPS] < 2486.20 us>
Sequential 1MiB (Q= 1, T= 1): 971.740 MB/s [ 926.7 IOPS] < 1078.45 us>
Random 4KiB (Q= 32, T=16): 1793.357 MB/s [ 437831.3 IOPS] < 1168.26 us>
Random 4KiB (Q= 1, T= 1): 24.793 MB/s [ 6053.0 IOPS] < 164.99 us>

[Write]
Sequential 1MiB (Q= 8, T= 1): 2051.279 MB/s [ 1956.3 IOPS] < 4081.96 us>
Sequential 1MiB (Q= 1, T= 1): 1974.290 MB/s [ 1882.8 IOPS] < 530.66 us>
Random 4KiB (Q= 32, T=16): 1961.160 MB/s [ 478798.8 IOPS] < 1068.35 us>
Random 4KiB (Q= 1, T= 1): 228.774 MB/s [ 55853.0 IOPS] < 17.79 us>

Profile: Default
Test: 1 GiB (x5) [Interval: 5 sec] <DefaultAffinity=DISABLED>
Date: 2020/04/22 1:02:39
OS: Windows 10 Enterprise [10.0 Build 18363] (x64)
Comment: Intel DC P4500 4 TB/PCIe 3.0 x4 (U.2) via X570 Chipset

2) P4500 4 TB results on port 0 of a Broadcom HBA 9400 8i8e:

[Read]
Sequential 1MiB (Q= 8, T= 1): 394.730 MB/s [ 376.4 IOPS] < 21179.58 us>
Sequential 1MiB (Q= 1, T= 1): 1284.536 MB/s [ 1225.0 IOPS] < 815.78 us>
Random 4KiB (Q= 32, T=16): 1467.239 MB/s [ 358212.6 IOPS] < 1428.24 us>
Random 4KiB (Q= 1, T= 1): 23.763 MB/s [ 5801.5 IOPS] < 172.15 us>

[Write]
Sequential 1MiB (Q= 8, T= 1): 1283.654 MB/s [ 1224.2 IOPS] < 6520.17 us>
Sequential 1MiB (Q= 1, T= 1): 1146.157 MB/s [ 1093.1 IOPS] < 914.24 us>
Random 4KiB (Q= 32, T=16): 1915.610 MB/s [ 467678.2 IOPS] < 1093.78 us>
Random 4KiB (Q= 1, T= 1): 167.853 MB/s [ 40979.7 IOPS] < 24.30 us>

Profile: Default
Test: 1 GiB (x5) [Interval: 5 sec] <DefaultAffinity=DISABLED>
Date: 2020/04/22 1:59:41
OS: Windows 10 Enterprise [10.0 Build 18363] (x64)
Comment: Intel DC P4500 4 TB/HBA 9400 8i8e P0 (x8 via CPU)

3) P4500 4TB results on port 1 of a Broadcom HBA 9400 8i8e:

[Read]
Sequential 1MiB (Q= 8, T= 1): 565.320 MB/s [ 539.1 IOPS] < 14798.60 us>
Sequential 1MiB (Q= 1, T= 1): 1328.433 MB/s [ 1266.9 IOPS] < 788.93 us>
Random 4KiB (Q= 32, T=16): 1481.024 MB/s [ 361578.1 IOPS] < 1414.98 us>
Random 4KiB (Q= 1, T= 1): 24.026 MB/s [ 5865.7 IOPS] < 170.27 us>

[Write]
Sequential 1MiB (Q= 8, T= 1): 1187.603 MB/s [ 1132.6 IOPS] < 7046.48 us>
Sequential 1MiB (Q= 1, T= 1): 1103.407 MB/s [ 1052.3 IOPS] < 949.69 us>
Random 4KiB (Q= 32, T=16): 1816.169 MB/s [ 443400.6 IOPS] < 1153.77 us>
Random 4KiB (Q= 1, T= 1): 166.331 MB/s [ 40608.2 IOPS] < 24.51 us>

Profile: Default
Test: 1 GiB (x5) [Interval: 5 sec] <DefaultAffinity=DISABLED>
Date: 2020/04/22 1:02:06
OS: Windows 10 Enterprise [10.0 Build 18363] (x64)
Comment: Intel DC P4500 4 TB/HBA 9400 8i8e P1 (x8 via CPU)

Contrasting results with Optane 905P SSDs:

4) 905P 480 GB results with X570 chipset:

[Read]
Sequential 1MiB (Q= 8, T= 1): 2783.138 MB/s [ 2654.2 IOPS] < 3012.56 us>
Sequential 1MiB (Q= 1, T= 1): 2553.412 MB/s [ 2435.1 IOPS] < 410.43 us>
Random 4KiB (Q= 32, T=16): 2434.875 MB/s [ 594451.9 IOPS] < 860.40 us>
Random 4KiB (Q= 1, T= 1): 228.289 MB/s [ 55734.6 IOPS] < 17.83 us>

[Write]
Sequential 1MiB (Q= 8, T= 1): 2486.836 MB/s [ 2371.6 IOPS] < 3367.93 us>
Sequential 1MiB (Q= 1, T= 1): 2318.443 MB/s [ 2211.0 IOPS] < 451.93 us>
Random 4KiB (Q= 32, T=16): 2478.988 MB/s [ 605221.7 IOPS] < 845.04 us>
Random 4KiB (Q= 1, T= 1): 206.504 MB/s [ 50416.0 IOPS] < 19.72 us>

Profile: Default
Test: 1 GiB (x5) [Interval: 5 sec] <DefaultAffinity=DISABLED>
Date: 2020/04/22 2:00:29
OS: Windows 10 Enterprise [10.0 Build 18363] (x64)
Comment: Intel Optane 905P 480GB/PCIe 3.0 x4 (U.2) via X570 Chipset

5) 905P 480 GB results on port 0 of a Broadcom HBA 9400 8i8e:

[Read]
Sequential 1MiB (Q= 8, T= 1): 2748.844 MB/s [ 2621.5 IOPS] < 3049.99 us>
Sequential 1MiB (Q= 1, T= 1): 2004.216 MB/s [ 1911.4 IOPS] < 522.90 us>
Random 4KiB (Q= 32, T=16): 2429.355 MB/s [ 593104.2 IOPS] < 836.27 us>
Random 4KiB (Q= 1, T= 1): 175.765 MB/s [ 42911.4 IOPS] < 23.19 us>

[Write]
Sequential 1MiB (Q= 8, T= 1): 2232.759 MB/s [ 2129.3 IOPS] < 3750.86 us>
Sequential 1MiB (Q= 1, T= 1): 1619.994 MB/s [ 1544.9 IOPS] < 646.81 us>
Random 4KiB (Q= 32, T=16): 2128.465 MB/s [ 519644.8 IOPS] < 984.38 us>
Random 4KiB (Q= 1, T= 1): 155.891 MB/s [ 38059.3 IOPS] < 26.15 us>

Profile: Default
Test: 1 GiB (x5) [Interval: 5 sec] <DefaultAffinity=DISABLED>
Date: 2020/04/22 1:00:54
OS: Windows 10 Enterprise [10.0 Build 18363] (x64)
Comment: Intel Optane 905P 480 GB/HBA 9400 8i8e P0 (x8 via CPU)

6) 905P 480 GB results on port 1 of a Broadcom HBA 9400 8i8e:

[Read]
Sequential 1MiB (Q= 8, T= 1): 2746.736 MB/s [ 2619.5 IOPS] < 3052.52 us>
Sequential 1MiB (Q= 1, T= 1): 1988.874 MB/s [ 1896.7 IOPS] < 526.92 us>
Random 4KiB (Q= 32, T=16): 2429.357 MB/s [ 593104.7 IOPS] < 835.73 us>
Random 4KiB (Q= 1, T= 1): 175.590 MB/s [ 42868.7 IOPS] < 23.21 us>

[Write]
Sequential 1MiB (Q= 8, T= 1): 2229.133 MB/s [ 2125.9 IOPS] < 3756.48 us>
Sequential 1MiB (Q= 1, T= 1): 1615.661 MB/s [ 1540.8 IOPS] < 648.56 us>
Random 4KiB (Q= 32, T=16): 2123.493 MB/s [ 518430.9 IOPS] < 986.68 us>
Random 4KiB (Q= 1, T= 1): 155.849 MB/s [ 38049.1 IOPS] < 26.16 us>

Profile: Default
Test: 1 GiB (x5) [Interval: 5 sec] <DefaultAffinity=DISABLED>
Date: 2020/04/22 1:59:11
OS: Windows 10 Enterprise [10.0 Build 18363] (x64)
Comment: Intel Optane 905P 480 GB/HBA 9400 8i8e P1 (x8 via CPU)

Any ideas? I’m dumbfounded.

Thanks for any advice!

Regards,
aBavarian Normie-Pleb

1_CDM_X570_U2(3.0 x4)_NVMe_DC_P4500_4TB.txt (2.5 KB)

2_CDM_HBA9400_P0_NVMe_DC_P4500_4TB.txt (2.5 KB)

3_CDM_HBA9400_P1_NVMe_DC_P4500_4TB.txt (2.5 KB)

4_CDM_X570_U2(3.0 x4)_NVMe_Optane_905P_480GB.txt (2.5 KB)

5_CDM_HBA9400_P0_NVMe_Optane_905P_480GB.txt (2.5 KB)

6_CDM_HBA9400_P1_NVMe_Optane_905P_480GB.txt (2.5 KB)

1 Like

Minor update, Broadcom’s Tech Support confirms the issue:

“Our experts know of these issues and will not investigate this matter further. We recommend you use other NVMe drives.”

Even their brand-new PCIe 4.0 x8 HBA won’t fix this:

"We cannot confirm that this issue does not appear with an HBA 95xx model. It is recommended to use different drives.”

Since they already knew about these issues and won’t do anything about it - even with their new high-end HBAs - I guess it’s quite OK to guess that Intel’s firmware is at fault here.

And of course: You cannot downgrade from the current firmware version for the P4500.

I’ll try to get a little more technical details from Broadcom to throw at Intel, maybe I can get them to replace the drives with another hardware revision (or something like that).

2 Likes

Minor update 2, it’s getting Monty Python-esque:

In my previous message I diplomatically asked if it would be a sensible idea to contact Intel about this, since why else wouldn’t Broadcom investigate this issue further and even leave their most recent HBAs hanging, too?

Tech Support’s response:

“I cannot confirm your inquiry! YOU YOURSELF found out about the different behaviors between the two drive models (Intel P4500/Intel Optane 905P)”

I then thought that I was beating a bit too much around the bush and repeated the stuff they had told me and how that makes the impression that only Intel is the one that can fix this.

Tech Support’s response 2:

“I cannot confirm your inquiry! YOU YOURSELF found out about the different behaviors between the two drive models (Intel P4500/Intel Optane 905P)”

Strange thing:
All messages from me and them are included in an email chain with time stamps. My recent, more direct question about Intel’s role in this is missing and they just re-sent the response they had sent previously.

1 Like

What does Intel say?

@Trooper_Ish

Will contact them next, @wendell is also so kind and tries to recreate another scenario where I have also been experiencing abyssmal results with the P4500 drives (AMD X570 NVMe RAID 0 arrays, sequential read values below that of single drive, writes are OK).

Was hoping to get something with a “little substance” out of Broadcom before contacting Intel but that doesn’t seem to happen as you can see by their latest reaction from today mentioned in my previous response.

1 Like

But the drive works okay when connected to an adapter in an M.2 slot? Like, the issue is using it through a Broadcom card?

Yes, both drives work with the expected performance from 0 to 99 % used space when used individually and directly connected to PCIe from the CPU or the X570 chipset. Even when hitting them both at the same time.

As soon as I take them to the HBA or enable AMD NVMe RAID in the UEFI, the drives slow down significantly.

Huh, the raid thing on the mobo is interesting. I’ll be keeping an eye out in case Intel deigns to admit anything/offer advice

Finally had a chance to build a windows system for testing this. C drive is P4500, then there are two other drives that are P4500.

Here is one instance of CDM running from C drive and testing D drive (P4500) just the one drive:

here is two drives running two copies of cdm simultaneously:

And the resulting score:

… okay, lets make a soft-raid of them in windows:

The Q32T16 number is perhaps a bit suspect? But otherwise this looks much better.

These are connected via the x8 slot setup as x4/x4 straight to the cpu.

I am not sure what to make of this. I should probably try to setup a system a little closer to what @aBav.Normie-Pleb has next, I think.

5 Likes

Thanks for following up on this!

I can confirm the Windows built-in Software RAID 0 results. Can you also check the drives’ behavior when configuring them as an AMD NVMe RAID 0 array?

As a reminder, here the results of my two P4500 in RAID 0 (AMD NVMe RAID via UEFI):

Contrasting Optane 905Ps, using the same PCIe connections:

X570_Chipset_NVMe_RAID0_2xOptane

1 Like

Sure, will do that now. I changed the threads to 4 (from default)

image

this should be totally bone stock defaults except threads (1) changed to threads (8)

results of that tweak:

1 Like

So that other users aren’t confused since two scenarios are being mixed now:

At first I tested the setup of the OP (ASUS Pro WS X570-ACE with 2 Broadcom HBA 9400-8i8e).

Then I went on testing AMD NVMe RAID0 with an ASRock X570 Taichi (BIOS P3.00, have already been using it for a few weeks, it went public a few days ago).

On the Taichi, I’m using M2_1 (PCIe via CPU) and M2_2 (PCIe via X570) for the P4500. There I’m having the issues Wendell is currently trying to recreate:

Since the drives act that weird in two quite different scenarios and that funny response from Broadcom’s Tech Support I’d say I’m 75 % sure there is something up with the P4500’s current firmware.

Yeah, something perf wise not right in the AMD Raid scenario with P4500:

However, it cant just be the firmware on the p4500 since linux is fine, and windows in at least one other scenario is also fine.

It “smells like” a sector size/alignment problem. But I did some quick experiments there and couldn’t really seem to make it better or worse. Windows wouldn’t let me create an intentionally misaligned partition, though, which might reveal that type of problem.

Sector size misalignment on raid but not windows native could explain the anomolay.

I reformatted the array and disabled cache during the raid setup, and set a sector size of 64kb. I could claw back some of the expected performance:

Maybe worth trying 256/128kb with cache disabled. Its interesting disabling cache in the amd raid thing makes the apparent speed in CDM that much faster. (2 threads running at once)

1 Like

Thank you, at least now I know I’m not an idiot doing something wrong.

As I wrote in our DMs in the past weeks, for me the performance is like a “slot machine” and is determined the moment the RAID array is created in the UEFI. The best results I ever saw were around 5000 MB/s 1M/Q8/1T, the worst in the 2000 range (read). And the Optanes are always blasting by even though they should have lower SEQ1M/Q8/T1 results.

I couldn’t establish a pattern between 64, 128 and 256 kB stripe size and cache options, but maybe I was just a bit too frustrated after the first rather disappointing experiences with the Broadcom HBAs.

Do you have “someone” to contact at AMD to further check on this (maybe a RAID firmware bug in the UEFI module?)? Or might it be a symptom of the same underlying issue that is slowing the drives down when using them with the HBA or the low RND4K/Q32/T16 results when using Windows’ built-in software RAID0?

I contacted ASRock about this since I noticed this with the Taichi, but they outright said that they have no clue how to trouble-shoot this.

I can try but it probably won’t be a priority. The “slot machine” aspect really REALLY makes it sound like its a simple sector misalignment type problem. Something like that. With overhead about 5gb/sec is about what I’d expect so if you do it a bunch and about 1 out of 8 times it’s fine, then I would expect it to for sure be the sector alignment problem.

I bet the intel drives report their sector size in a funny way and that’s the issue. Since things do actually look for that. And if not sector size something like that.

What I’m struggling with is why then using the AMD NVMe RAID array in Linux seems to be OK (or did I misunderstand you there?) - that sounds more like that there’s something wrong with AMD’s NVMe RAID driver?

I’m still a bit confused when also thinking about Broadcom’s statements regarding the P4500, they didn’t say that a different OS would solve the underlying issue with the drive’s performance (with an HBA).

AMD doesn’t have a native raid driver for Linux. If the drives work in windows with windows driver and Linux with linuxs driver it suggests not hardware. But the and driver has this “roulette” speed function with one piece of hardware but is solid on another. It’s baffling.

Does it make sense for me to open a support request with Intel - I got the impression that you already have a connection to them regarding this drive model and its firmware versions.

I on the other hand as a mere end customer pleb would most likely need some time to get through the first level support drones reading from their scripts until I get to an actual engineer.

I’m not in it to point fingers at Intel or AMD but I would like to have properly functioning drives some time in the not to distant future.

After what I’ve learned about this model with Broadcom and the AMD NVMe RAID array stuff I’d even have a bad gut feeling re-selling them to someone else.

I have no special access to Intel

They didn’t like the beginning of the Ryzen 3000 launch video?

1 Like