ECC capable/verified motherboards for Ryzen 7000

I’m putting together a new Linux machine for Haskell programming. Is there a resource I could use to find verified functionality for ECC with Ryzen 7000 motherboards? Does anyone have any recommendations?

Exactly zero such motherboards right now. The kernel doesn’t even have edac support completely just yet. It’s a big maybe on some but not all ASRock boards. I think only the 8 layer chips.

Edit: See below for promising support

… EDAC in the kernel isn’t quite there yet, though.

And it should be noted that with the latest bios on the gigabyte itx board I was using that it simply would not post with ECC memory; agesa version was newer than in the reddit thread.

3 Likes

Hmm?
https://www.reddit.com/r/truenas/comments/10lqofy/ecc_support_for_am5_motherboards/

2 Likes

I contacted Asrock by email asking if they had any ECC support for R7000. They just replied, it was short but here it is:

"Hi Daniel,

Thank you for contacting ASRock USA support.

Currently, all AM5 mainboards does not support ECC.

It depends on AMD whether will support ECC in the future."

If Asrock’s response is accurate it seems as though AMD hasn’t permitted ECC yet on this generation.

Hello. I have ECC working on ASUS ProArt X670E-Creator with AMD Ryzen 9 7950X.

But, you have to explicitly turn ECC on in the bios. If left on ‘Auto’, it will be off.

I use four sticks of Supermicro (Hynix) 32GB 288-Pin DDR5 4800 (PC5-38400) Server Memory (MEM-DR532MD-EU48).

Hope this helps.

5 Likes

Ok, there is a difference between supporting the RAM with ECC and actually having ECC working.

3 Likes

I am currenly looking for and Intel and AMD board that will support ECC. I found one by gigabyte. The board supports ECC, but is unbuffered. And for me, unbuffered is unregistered. So for me - why ECC if I can get not get it registrated. I want ECC- REgistered for the sake of stability in the system.

The only way I found around this - Xeon, Epyc or Threadripper Pro motherboards.

7000 series x299 with unupdated bioses can do regecc.

Other than that as you’ve found it is only TR Pro or Xeon (in a different socket).

DDR5 Reg ECC is also physically incompatible with unbuffered DDR5 desktop memory.

And desktop memory can be unbuffered ECC.

While you might be on to something for the DDR4 generation, only wanting registered ECC, any arguments I might make about that are moot with DDR5 since they have in-transit ECC as well as optional at-rest ECC.

192gb of unbuffered DDR5 ECC on desktop is perfectly fine IMHO.

1 Like

Gigabyte USA replied to me the ECC functionality isn’t supported so “ECC Suport” mentioned in various BIOS updates only refers to runs with ECC memory with disabled ECC functionality.

1 Like

in the past they’ve said the same thing where the reality was that EDAC supports and reports/corrects 1 bit errors, even at the WHEA level, but the errors aren’t forwarded to the SMU and/or optional IPMI. Meaning you only have “OS handles the error” and no option of “watchdog hard locks the system on 2 bit errors if the OS doesn’t intervene to ack the error” as you would have with a fully-baked ECC implementation.

This, tho, is why I say “no board” right now since I need the linux bits to verify. FWIW at least one person reports seeing ECC correcteds in their logs, with bleeding hot patches, but I haven’t observed that (not for lack of trying)

1 Like

Part of me sees it that way. Increase the volume of ram in the system to maybe make up for some kind of short coming.

Right now, I use 128GB of ECC on two sticks. I did not realize how important it was to fill up the bank of ram. My system runs great on two sticks, and I am so curious what would happen with all 8 banks filled with 64GB sticks, leading to 512GB. Honestly, that could be a bit of overkill and a hit to the wallet. Just a little bit.

This is the first time I’ve heard that there is some “stability benefit” to filling up your RAM/using more of it. What’s the reasoning here?

I don’t really understand this type of hardware stuff very well.

I don’t think @CltSvs was referring to stability, and I’m not aware of any particular argument along those lines.

Instead this sounds more about performance: a system designed with an N-channel memory controller may only achieve maximum performance when all channels are equally populated. If not, then some workloads will be slower than intended because some CPU cores will be stuck waiting for RAM I/O.

The details are more nuanced depending on workloads, but that’s the gist of it.

3 Likes

This is fascinating

Querying the memory controller

When I mentioned setting up ECC at work, Robert Mustacchi pointed me to the excellent illumos documentation about AMD’s Unified Memory Controller. I did some reading and learned that essentially, AMD processors expose a bus called the System Management Network (SMN). Among other things, this bus can be used to query and configure the AMD Unified Memory Controller (UMC).

NOTE: The information in the rest of this section is not part of the public AMD Processor Programming Reference, but can be gleaned from the source code for the open-source Linux and illumos kernels.

WARNING: Accessing the SMN directly, and especially sending write commands to it, is dangerous and can severely damage your computer. Do not write to the SMN unless you know what you’re doing.

The idea is that we can ask the UMC the question “is ECC enabled” directly, by sending a read request over the SMN to what is called the UmcCapHi register. The exact addresses involved are a little bit magical, but on illumos with a Ryzen 7000 processor, here’s how you would query the UMC over the SMN bus (channel 0 and channel 1 are the two memory channels on the system, and each channel has one of the 32GB sticks plugged into it.)

# Query the UMC at address 0x50df4, representing channel 0
$ pfexec /usr/lib/usmn -d /devices/pseudo/amdzen@0/usmn@2:usmn.0 0x50df4
0x50df4: 0x40000030

# Query the UMC at address 0x150df4, representing channel 1
$ pfexec /usr/lib/usmn -d /devices/pseudo/amdzen@0/usmn@2:usmn.0 0x150df4
0x150df4: 0x40000030

(pfexec is the illumos equivalent to sudo.)

Also, illumos comes with a really nice way to break up a hex value into bits:

$ mdb -e '0x40000030=j'
                1000000000000000000000000110000
                |                        ||
                |                        |+------ bit 4  mask 0x00000010
                |                        +------- bit 5  mask 0x00000020
                +-------------------------------- bit 30 mask 0x40000000

The bit we’re interested in here is bit 30. If it’s set, then ECC is enabled in the memory controller.

He then goes on to hack together something in linux that does the same, and then later finds that EDAC already queries the memory controller for bit 30. So If I understand this correctly EDAC is reporting what the SMU actually says it can do rather than guessing. Whereas dmidecode seems to report what the systems UEFI tells it.

6 Likes

Does ECC function on any of the am5 gigabyte boards to a relatively equivalent capacity to asrock’s boards?

Ah! That explains why I could not find any EDAC.

This response is contrary to the BIOS option descriptions on my ASRock Taichi X670E… “If set to Auto, ECC will be enabled.”

But it’s in line with the website, which got changed a few months later to disclaim support for ECC even though it claimed support when the motherboard was released.

Rise topic! Rise from the dead and thrive!!!

I too am curious how ECC support and reporting is panning out.
Particularly when it comes to boards geared for servers from the likes of Supermicro and ASRock Rack.

Live view of how udimm ecc testing has gone on am5

tumblr_ppluhl3mDQ1wezi6zo2_540.gifv

8 Likes

ECC Working fine on an ASUS ProArt X670E CREATOR-WIFI, even ECC Error Injection for easier testing:

3 Likes

is the ECC platform reporting working on the ASUS ProArt X670E? or does the fact that memtest86 says it has ECC confirm that, I’m unfamiliar with how memtest86 figures that out.