Ryzen Linux ECC might have issues ,plz respond if you are trying with ECC

I am currently working with the edac linux kernel folks trying to figure out a oddity with the ecc memory running in ecc mode on linux. Trying to figure out if it happening on all motherboards. Currently ECC will run in only certain configurations we have proof of it happening on Asrock Taichi, k4, gaming pro boards. Note ECC still works on these boards, just only in certain configurations(2 dimms) and all information about the ECC modules ends up being bogus. Though ECC still works from the testing wendell has done with overclocking. Might be a board issue might be a kernel issue. Trying to see if other boards are returning the same bogus information that appears in the dmesg output with all the MC listing size 0.

If you are currently running a ryzen cpu with ecc ram with linux >=4.10 (no matter what your mb manufacturer might be)
can you do the following.

NOTE:
Mainly trying to figure out if all boards are returning garbage for registers that hold ecc configuration information.

post
1)
MB: motherboard, brand and model
Bios: bios
Ram: used module number and qty:
Banks: were ram is install in, example : a2 , b2 (banks index staring with one,)
Example
Mb: Asrock Fatal1ty x370 gaming k4
Bios: 1.60
Ram: KVR24E17D8/16MA Qty 2
Banks: a2, b2

2)
dmesg | grep 'edac\|EDAC\|Linux version'

You might want to have a quick look at this article (and forum posts) if you haven't already seen it:

http://www.hardwarecanucks.com/forum/hardware-canucks-reviews/75030-ecc-memory-amds-ryzen-deep-dive.html

What issues are you seeing? I plan to try myself with the ASRock x370 Taichi and Crucial CT2K16G4WFD824A ECC RAM.

Make sure you get 2 dimms, it won't work with only one right now.
I am working directly with linux kernel developers and amd.
As in the example
dmesg | grep -i edac
The output is garbage
It's basically looking like it is luck that ECC is even being enabled.
All the registers for ecc settings are returning garbage , thus edac cannot really tell you anything more than it is enabled.
This is also most likely why it doesn't work with just one dimm.
Good to know that ECC part works.

It's the same as in the article by wendell, the MC information is all zero. Trying to see if all motherboards do the same thing, to sort if it is motherboard/kernel/cpu issue. Just make sure you get 2 dimms.
Basically the edac module is trying to read registers related to ecc configuration parameters and it currently appears that it is actually running into pci read errors(note don't know what is causing the pci read errors) so ECC is enabled but the configuration parameters end up getting garbage.

I used two dimms for testing https://www.amazon.com/Kingston-Technology-ValueRAM-KVR24E17D8-16MA/dp/B01FM3GHQU

I was warned 4 dimms might not work properly though it should be a software fix

1 Like

Mailed Biostar support, for information on ECC memory this is their reply:

X370GT5 specification does not support ECC memory,

however,we tested below E​CC memory able to get display and run. (Note : Does not support ECC special function).

APACER DDR4 78.B1GM4.4020B 4GB 2133 MHz 1.2V

APACER DDR4 78.B1GM4.AF10B 8GB 2133 MHz 1.2V

HYNIX DDR4 HMA41GU7AFR8N-TF 8GB 2133 MHz 1.2V

Fixed in kernel 4.11.3, I believe:

1)
MB: ASRock x370 Taichi
BIOS: UEFI 2.30
RAM: Crucial CT2K16G4WFD824A 32GB (2 x 16GB ECC, 2400MHz, Dual Rank)
Banks: a2, b2

2)
root@deepthought:~# dmesg | grep 'edac\|EDAC\|Linux version'
[ 0.000000] Linux version 4.11.3 (marc@deepthought) (gcc version 7.1.0 (GCC) ) #1 SMP Fri May 26 11:21:15 BST 2017
[ 8.811227] EDAC MC: Ver: 3.0.0
[ 8.822946] EDAC amd64: Node 0: DRAM ECC enabled.
[ 8.824590] EDAC amd64: F17h detected (node 0).
[ 8.825479] EDAC MC: UMC0 chip selects:
[ 8.825481] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 8.827314] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 8.828115] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 8.828889] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 8.829685] EDAC MC: UMC1 chip selects:
[ 8.829686] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 8.830658] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 8.831514] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 8.832092] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 8.832749] EDAC amd64: using x8 syndromes.
[ 8.833485] EDAC amd64: MCT channel count: 2
[ 8.834253] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
[ 8.835695] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[ 8.836301] AMD64 EDAC driver v3.5.0

MB: ASRock X470 Taichi
Bios: 3.62 (technically not offically released, but necessary to address boot loop issues with similar RAM)
Ram: Samsung M391A2K43BB1-CRC Qty 2
Banks: a2, b2

ubuntu@ubuntu:~$ dmesg | grep ‘edac|EDAC|Linux version’
[ 0.000000] Linux version 5.3.0-18-generic (buildd@lcy01-amd64-027) (gcc version 9.2.1 20190909 (Ubuntu 9.2.1-8ubuntu1)) #19-Ubuntu SMP Tue Oct 8 20:14:06 UTC 2019 (Ubuntu 5.3.0-18.19-generic 5.3.1)
[ 0.197969] EDAC MC: Ver: 3.0.0

Based on the dmesg output alone, it looks like ECC support flat out does not work on my system. However this flies in the face of what I see in Windows. Basically the wmic, HWiNFO64 and AIDA64 results for my system are the same as these

The previous post mentions issues being resolved in 4.11.3. Do the fixes extend to 5-series kernels? Is there anything else that needs to be enabled under Linux ECC memory, or should it Just Work™?

ECC on linux is generally fine with AM4.
Are you on a CPU or APU?

There’s a Ryzen 3700X CPU in the system, so that shouldn’t be the issue.

Edit: turned out to be the kernel

1 Like