Critiquing: Really shitty AMD X570 (also B550) SATA SSD RAID1/10 performance - Sequential Write Speed Merely A Fraction Of What It Could Be :(

Hi,

I’m creating this thread to warn other users and vent my own frustration about the abyssmal performance of AMD’s X570 motherboard software RAID performance for SATA SSDs.

While I understand that SATA/AHCI isn’t getting much love anymore and NVMe is the future, I still highly value this technology for being very reliable at this point in time.

I made these observations while I was setting up a new daily-driver system that was supposed to have SATA RAID1 for its operating system (Windows 10), after many frustrating hours, SSD erases and operating system reinstallations I’m currently also trying out RAID10 which performs similarly bad.

Simplified overview of my expectations:

  • RAID1 (2 SSDs): 2n read performance, 1n write performance (only looking at sequential speeds);

  • RAID10 (4 SSDs): 4n read performance, 2n write performance (only looking at sequential speeds);

Reality with AMD’s software RAID1 and 10:

  • RAID1 (2 SSDs): <2n read performance, 0.5n write performance (only looking at sequential speeds);

  • RAID10 (4 SSDs): <4n read performance, 0.5n write performance (only looking at sequential speeds);

The performance is so bad that when I use the motherboard’s 2.5 GbE NIC to copy data to the RAID array the responsiveness of applications is tanking (reminder: operating system installed on it) since the copy process with its a little over 200 MB/s is maxing out the RAID array to 100 % load according to Task Manager or HWiNFO.

This is absurd.

The source of these issues seems to be bugs but I haven’t been able to recognize a repeatable pattern yet.

Switching through the available RAID array cache settings in AMD’s RAID management software I can sometimes see write values reach the normal expected speeds of 1n/2n but this is only temporary.

I have absoluetely no idea what the cause for the bottleneck is

  • Tried Windows 10 20H2 and 21H1 with all Windows Updates, device firmwares and drivers up-to-date;

  • Checked that Windows is not doing something in the background (also physically disconnected ethernet adapter connections to be 99 %-sure);

  • Erased all SSDs and tried again;

  • Created the RAID arrays with completely default settings, with changed stripe sizes (AMD’s documentation warns against this), with and without initialization;

  • I’m using Micron 5300 PRO 3.84 SSDs (tested 6 units), while not the fastest SATA SSDs on the planet they are enterprise-class ones that can sustain their performance even when filling them up and more importantly have complete powerloss protection to ensure a RAID array’s data integrity in PSU/motherboard failure scenarios;

  • All SSDs get a clean SATA 6 Gb/s connection to the motherboard, all C7 SMART values are still completely unchanged at their factory-new values implying there are no electrical communication issues between the SSDs and the motherboard’s SATA controller;

  • Tested all 8 of the motherboard’s X570 chipset SATA ports;

Observations where I’m not sure of regarding their impact on the situation

  • Even when choosing all member drives of an array to be “SSDs”, the AMD RAID drivers are presenting the RAID array to Windows as a mechanical HDD;
    (Fixed with new driver package released on May 25th, 2021)

  • TRIM is not supported by AMD’s RAID drivers, would be pretty bad when using consumer-grade NAND SSDs;

System hardware configuration:

  • ASRock X570 Taichi Razer Edition, P1.50
  • 5950X
  • 128 GiB RAM ECC
  • PCIe x16 #1 (CPU): GPU, PCIe 4.0 x8
  • PCIe x16 #2 (CPU): Ethernet adapter, PCIe 3.0 x8
  • PCIe x16 #3 (X570): Thunderbolt 3 adapter, PCIe 3.0 x4

Appendix:

RA: Read-ahead cache

WB: Write-back cache

2 Likes

How are you testing various configs? One thing that might not be apparent is that with multiple devices involved, it will require multiple read threads working in parallel. A single threads’ reads can’t (usually isn’t) split over a set of devices. But if you have 3-4 readers reading one file, then the reads will queue properly.

3 Likes

This is not necessary here for simple RAID1/10 volumes when testing for sequential speeds, CDM can with its standard settings for sequential runs (SEQ1MQ8T1) reach the a little over 900 MB/s write for a RAID10 configuration, for example.

When the unkown issue is triggered at max (<300 MB/s SEQ1MQ8T1 write speeds) you can also see in Task Manager (yes I know) but also HWiNFO that the RAID array is at 100 % load even then.

Note: I’m obviously not doing any benchmarking with active RAID software background tasks and after the initialization process has completely finished.

I’m currently loading a few single SSDs up in AHCI mode with test files to show their isolated performance numbers when filled at 99 %.

1 Like

Added the CDM screenshots of the AMD RAID array to the initial posting and here are some of two units of the same model as individual SSDs in AHCI mode:

1 Like

Cool thread. I’d be interested to see how it performs with a Linux software raid instead.

I appreciate this is not convenient.

1 Like

First up:

I’ve noticed with various consumer grade SSDs that sustained write performance is garbage. i.e., they have very small (or no) cache and then performance drops off a cliff.

Secondly…
What is your block/stripe/raid chunk size?, What is your NTFS block size?

If your writes are smaller than your array’s block size (e.g., maybe in the NTFS block size you’re running on top of the array even?) you will be doing read-modify-write for every write… i.e., it may not be bad hardware performance; it may be just working very hard wasting time doing busy work… I’m not saying I have a magic number for you for sizing this - just that there WILL be numbers that work badly or better.

This could also be happening at the SSD level if your RAID block size doesn’t line up perfectly with what the SSD’s onboard controller does.

e.g. if your RAID block somehow involves multiple SSD read/writes you’ll tank performance making the SSD do read-modify-write for a single write (potentially even after read-modify-write at the RAID block level above it - multiplying your SSD controller’s IOPs required to service one request more than once!).

The fact your reads (which never involve read-modify-write) are somewhere in the ball-park leads me to believe you may have some sort of weird “write causes read-modify-write” problem as per above.

Now, a properly SSD aware appliance/array/filesystem will (hopefully) take all this stuff into account. An enterprise SSD array may (probably will) even have custom firmware on the SSDs used.

NTFS on Windows on RAID Controller on SSD? I doubt it will all magically line up without tinkering/tuning on your part. SSDs themselves do all sorts of “weird shit” (well, black box internal stuff) on the controller, unlike your traditional spinning disks which are pretty dumb (but even they are doing stuff like SMR today which really fucks up SMR-Unaware RAID implementations - i.e., pretty much all of them).

Clearly your RAID controller is not SSD aware; as you mention it presents them all as traditional disks.

It may simply be a case of “this isn’t going to work with SSD in your setup”, I suspect you may (but even then, SSD RAID is still new) have more success with ZFS as that integrates the filesystem and the RAID level a bit better at least, combines multiple writes into nice big stripes, etc.

NTFS does not.

3 Likes

For an idea of what I’m talking about above, google the 512 byte vs. 4k sector problem with ZFS from a few years ago.

TLDR: writing 512 bytes to a single sector on native 4k sectors means the 4k native sector needs to be read, modified and then written out as a new 4k sector if you write 512b to the drive.

Older OS or filesystems that aren’t 4k drive aware would see massive performance penalties - writing data one 512 byte “sector” at a time would multiply the work required by 12 (I think?) vs. writing 4k natively.

1 Like

For drive firmware updates: https://www.micron.com/support/software-and-drivers
Linux probably would work better for RAID. What about getting an HBA?

1 Like

Thanks for your replys!

Some short reminders:

  • The purpose of this RAID array is to have Windows living on it and to not have its operation interrupted by an isolated drive failure;

  • The only things the user can change regarding settings in AMD’s RAID software is the stripe size (64 kB (standard), 128 kB, 256 kB - tried them all, not much change at all);

  • Windows formats the RAID volume during its installation process with NTFS and 4 kB sector size;

  • As mentioned in my initial wall of text, every device’s firmware is up-to-date, as well as the drivers;

  • Without going VM the only way to use ZFS here would be to boot over ethernet (system’s got an Intel XL710 dual 40 GbE adapter) but that in itself has its issues when wanting to use it for Windows;

  • All PCIe slots are occupied, so a hardware RAID controller is not an option;

  • I’m not looking for peak performance, I just don’t want to get thrown back to the performance level of early SATA SSDs from 2009;

  • As far as I know the only hardware issue with SATA SSDs on AMD’s current X570 platform is the bad RND4KQ32T1 performance compared to Intel’s, nothing regarding sequential write speeds;

1 Like

Don’t all X570 motherboards have a built-in RAID controller that can do RAID 0, 1, and 10? Which would be set-up and maintained in the BIOS.

1 Like

Yes, you’re right, that is what he was testing the raid10 on :frowning:

(The dedicated hardware controller was meaning an add in pcie card)

2 Likes

Which is confusing me, because it’s seems more like he’s using software raid for everything. Maybe I’m just not following.

This is he’s opening like:

Are you stacking software RAID on top of a hardware RAID?

1 Like

No…?

I’m just using the motherboard’s UEFI SATA RAID feature - this isn’t really hardware RAID since it uses the CPU for its calculations - something like RAID0, 1 or 10 hasn’t been anything noteworthy regarding load for a CPU for… 15 years?

By contrast a hardware RAID controller has its own little SoC doing the calculations.

1 Like

I was under the impression that the on-board RAID was its own SOC… whatever.

1 Like

Sorry to interrupt, just wondering if something like this could help?

[Edit]Here’s a proper link, instead of the mobile version:
https://www.alibaba.com/product-detail/NGFF-M-2-RAID-Adapter-card_60491160854.html

1 Like

That’s a neat little card. Didn’t Jeff Geerling use one like that?

Alibaba’s advertising is a bit rude. It noticed I wasnt using the Alibaba app. I didn’t want to open the page for it in Google play or Samsung store, so it helpfully opened a tab for the Google play page instead.

2 Likes

Thanks for the heads up, yeah, most mobile versions are like that. I seriously hate it, too. 9 times out of 10, the mobile app is waste of space on the phone, best case.

1 Like

Maybe I am still going through the stages of grieve but I still hope that AMD can be shamed into getting their shit together regarding the drivers.

Since I have actually seen the performance numbers I would be happy with I doubt that it is an unfixable hardware bottleneck.

I’d also want to actually use the remaining two M.2 slots (one from CPU, one from chipset) for NVMe SSDs for non-OS data use.

But so far whenever I encountered such an issue it has never been fixed by the manufacturer :frowning:

1 Like

Well, are you sure that AMD is to blame, and not the BIOS or Motherboard manufacturer crippling functionality due to a bad interface?

Unfortunately, this is a rather niche use case for the chipset X570 is marketed towards, so I doubt anyone will dig deep for this. Most just go with hardware RAIDs on expansion cards, these days - if they go for RAID at all, most high-end systems these days are quite satisfied with a single 2TB NVMe drive. I’d just invest in one of those than try to keep a RAID going, but your choice.

A NAS incremental backup every 4th hour is usually better than any local RAID in either case, especially if the NAS runs ZFS.

2 Likes
  • I asked someone with ASRock about it, they shrugged it off “We have nothing to do with that, it’s all from AMD”;

  • It doesn’t make a difference if the RAID array has been created via UEFI or out of Windows via AMD’s RAID management software;

  • A new driver got released today, the only difference is that Windows is now recognizing the RAID array as an SSD instead of a mechanical HDD, TRIM continues to not work;

  • The seemingly random sequential write performance with the enterprise-class SSDs that in fact don’t slow down is unchanged :frowning:

  • The best WRITE results seem to be gotten with completely disabled caches or Read-ahead caches only;

1 Like