I am currently working with the edac linux kernel folks trying to figure out a oddity with the ecc memory running in ecc mode on linux. Trying to figure out if it happening on all motherboards. Currently ECC will run in only certain configurations we have proof of it happening on Asrock Taichi, k4, gaming pro boards. Note ECC still works on these boards, just only in certain configurations(2 dimms) and all information about the ECC modules ends up being bogus. Though ECC still works from the testing wendell has done with overclocking. Might be a board issue might be a kernel issue. Trying to see if other boards are returning the same bogus information that appears in the dmesg output with all the MC listing size 0.
If you are currently running a ryzen cpu with ecc ram with linux >=4.10 (no matter what your mb manufacturer might be) can you do the following.
NOTE: Mainly trying to figure out if all boards are returning garbage for registers that hold ecc configuration information.
post 1) MB: motherboard, brand and model Bios: bios Ram: used module number and qty: Banks: were ram is install in, example : a2 , b2 (banks index staring with one,) Example Mb: Asrock Fatal1ty x370 gaming k4 Bios: 1.60 Ram: KVR24E17D8/16MA Qty 2 Banks: a2, b2
Make sure you get 2 dimms, it won't work with only one right now. I am working directly with linux kernel developers and amd. As in the example dmesg | grep -i edac The output is garbage It's basically looking like it is luck that ECC is even being enabled. All the registers for ecc settings are returning garbage , thus edac cannot really tell you anything more than it is enabled. This is also most likely why it doesn't work with just one dimm. Good to know that ECC part works.
It's the same as in the article by wendell, the MC information is all zero. Trying to see if all motherboards do the same thing, to sort if it is motherboard/kernel/cpu issue. Just make sure you get 2 dimms. Basically the edac module is trying to read registers related to ecc configuration parameters and it currently appears that it is actually running into pci read errors(note don't know what is causing the pci read errors) so ECC is enabled but the configuration parameters end up getting garbage.
MB: ASRock X470 Taichi
Bios: 3.62 (technically not offically released, but necessary to address boot loop issues with similar RAM)
Ram: Samsung M391A2K43BB1-CRC Qty 2
Banks: a2, b2
ubuntu@ubuntu:~$ dmesg | grep ‘edac|EDAC|Linux version’
[ 0.000000] Linux version 5.3.0-18-generic (buildd@lcy01-amd64-027) (gcc version 9.2.1 20190909 (Ubuntu 9.2.1-8ubuntu1)) #19-Ubuntu SMP Tue Oct 8 20:14:06 UTC 2019 (Ubuntu 5.3.0-18.19-generic 5.3.1)
[ 0.197969] EDAC MC: Ver: 3.0.0
Based on the dmesg output alone, it looks like ECC support flat out does not work on my system. However this flies in the face of what I see in Windows. Basically the wmic, HWiNFO64 and AIDA64 results for my system are the same as these
The previous post mentions issues being resolved in 4.11.3. Do the fixes extend to 5-series kernels? Is there anything else that needs to be enabled under Linux ECC memory, or should it Just Work™?