On the ASUS Proart X870E Creator Wifi memory support QVL page there are various ECC modules supported up to 64GB total but only at 4800. My modules infact cant run at full speed (5600) and I’ve had to leave them at 5400 along with setting memory context restore so that there isnt long memory training on boot.
5400 was as high as I could go, and I needed to be on the latest bios version as without it I couldnt even boot to the bios, but now that I have 5400 set, there are no issues with booting or boot times (basically instant).
dmidecode -t memory
returns
Memory Device
Array Handle: 0x0013
Error Information Handle: 0x0017
Total Width: 72 bits
Data Width: 64 bits
Size: 48 GB
Form Factor: DIMM
...
Where the 72-bit section is the one of note as the extra width is for ECC.
There is also dmesg | grep -i EDAC
Which outputs
Oct 26 20:51:13 localhost kernel: EDAC MC: Ver: 3.0.0
Oct 26 20:51:18 localhost kernel: EDAC MC0: Giving out device to module amd64_edac controller F1Ah_M40h: DEV 0000:00:18.3 (INTERRUPT)
Oct 26 20:51:18 localhost kernel: EDAC amd64: F1Ah_M40h detected (node 0).
Oct 26 20:51:18 localhost kernel: EDAC MC: UMC0 chip selects:
Oct 26 20:51:18 localhost kernel: EDAC amd64: MC: 0: 0MB 1: 0MB
Oct 26 20:51:18 localhost kernel: EDAC amd64: MC: 2: 16384MB 3: 8192MB
Oct 26 20:51:18 localhost kernel: EDAC MC: UMC1 chip selects:
Oct 26 20:51:18 localhost kernel: EDAC amd64: MC: 0: 0MB 1: 0MB
Oct 26 20:51:18 localhost kernel: EDAC amd64: MC: 2: 16384MB 3: 8192MB
There is also sudo lshw -class memory
*-memory
description: System Memory
physical id: 13
slot: System board or motherboard
size: 96GiB
capabilities: ecc
configuration: errordetection=multi-bit-ecc
Lastly, Passmark Memtest86 indicates that ECC is working.
That being said
I could not find a way thus far to inject ecc errors to actually test the functionality.
I attempted to use Passmark Memtest86’s error injection feature, but unfortunately it does not work, so it was a waste of 75 dollars (not that I’m implying it is Passmark’s fault, as in the forums they state that this is not enabled on most cpus except some engineering samples so I have no idea why its a mass market feature of the software but…)
I also experimented with a bevy of bios settings such as:
- ECC (of course)
- Log Transparent Errors
- Setting Disable Memory Error Injection to False (which I really thought would allow the Passmark Memtest86 feature to work but did not)
- Advanced Error Reporting (which Im pretty confident is unrelated but hey, I was trying anything I could think of)
- Enabling MCA thresholding set to 1, (which actually isnt super related but I figure I would likely prefer interrupts over polling)
- Freeze DF Module Queues On Error (which I believe should at least stop errors from being propagated into data)
There is more I think I might do at some point such as reading through the error injection methods talked about by some Linux kernel developers, but Im pretty burned out at looking into this for now. A combination of so much information being buried in footnotes, and that a major selling point of hardware is this difficult to actually check the validity of.
I’ve seen suggestions about hardware testing techniques but the only ones that look safe are for DDR4 or earlier, I imagine as DDR5 with UDIMMS and even CUDIMMS is hitting signal integrity limits already.
Sorry or you’re welcome for the long response; whichever is applicable.
Hopefully this helps some other poor soul avoid going down this rabbit hole in the future.