Separate post since this is about a different topic: ECC reporting - it seems like a lot of people are interested in this.
My board seems sort of faulty and I can get some ECC errors on demand. If I remember correctly someone of FreeNAS forum reported something similar.
When using a specific DIMM slot with auto timings/frequencies I get this sort of cache errors in dmesg:
[ 313.623092] [Hardware Error]: Corrected error, no action required.
[ 313.629297] [Hardware Error]: CPU:0 (17:8:2) MC15_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0x9c2040000000011b
[ 313.639760] [Hardware Error]: Error Addr: 0x00000000005e8b00
[ 313.645453] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0x00000a400a400103
[ 313.653215] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0, DRAM ECC error.
[ 313.661594] EDAC MC0: 1 CE on mc#0csrow#3channel#0 (csrow:3 channel:0 page:0xbd1 offset:0x700 grain:64 syndrome:0xa40)
[ 313.672301] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
memtest86:
PassMark MemTest86 V8.3 Free AMD Ryzen 5 2600 Six-Core
Clk/Temp : 3428 MHz / 43C | Pass 25% ########
L1 Cache : 96K 84.43 GB/s| Test 36% ###########
L2 Cache : 512K 64.73 GB/s| Test 3 [Moving inversions, ones & zeroes]
L3 Cache :16384K 25.29 GB/s| Address : 0x100000000 - 0x40F380000
Memory : 15.9G 10.35 GB/s|
RAM Info : PC4-21300 DDR4 2666MHz / Kingston 9965745-002.A00G
CPU: 0123456789AB | CPUs Found: 12
State: /WWWWWWWWWWW | CPUs Started: 12 CPUs Active: 12
---------------/-------------------------------------------------------------
Time: 0:00:|Test complete, press any key to display summary|rrors: 0
-----------------------------------------------/
[ECC Errors detected] Test: 0 Addr: E7BD0640
[ECC Errors detected] Test: 1 Addr: 1135F40
[ECC Errors detected] Test: 1 Addr: 11657C0
[ECC Errors detected] Test: 1 Addr: ED41EE00
[ECC Errors detected] Test: 1 Addr: 11F2A40
[ECC Errors detected] Test: 2 Addr: ED41EA40
[ECC Errors detected] Test: 2 Addr: 129AD40
[ECC Errors detected] Test: 2 Addr: 1301800
[ECC Errors detected] Test: 3 Addr: 150D280
There is nothing in IPMI Event Logs.
Changing the default frequency to something different in the OC menu (lower or higher) fixes the errors so I guess this is some memory training issue on my board.
- I tested 4 sticks (2 of which are on QVL) - the same errors
- Change just the mobo - no more errors
What is interesting is that I see no errors when I disable ECC in the BIOS. (checked using memtest86 and memtester)
I send the board for RMA to the seller, got it back with a note saying that no issues was found. I got ghosted on any further questions about their testing.
Wrote to Asrock support, got new BIOS to try, few things to check but when I asked about a possibility of getting a replacement on the grounds of those ECC erros I got the same response that was pasted here a few times:
- AM4 support ECC function
- AM4 does not support ECC error reporting function
So with this I guess that indeed the regular ECC reports that I would expect to appear in IPMI logs on any Intel server board do not work. But it’s not all doom and gloom - at least single-bit errors are reported in syslogs and can be acted upon.