Follow-up to this: I think I’ve managed to prove to my satisfaction that ECC is working and that corrected errors get reported via WHEA. I found some guy on Reddit who figured out a simple way to get correctable errors - set the frequency as high as you can while still getting into Windows, then lower VDDIO and VDDQ significantly and eventually you’ll start getting WHEA warnings about correctable memory errors. And indeed, with ECC enabled and VDDIO/VDDQ set to a fairly ridiculous 0.9V, I get the expected WHEA events basically as soon as I get into Windows and start Zentimings and the event viewer:
The system is very unstable like this, and I can’t recommend it for the faint of heart, but as crude as it is, it does seem to prove the ECC reporting works. I tried rebooting with these settings a few times and got the exact same result every time. Just to further drive the point home I disabled ECC in UEFI and tried the same thing yet again, but as expected I got no WHEA events then. I messed around a bit trying to provoke one but with no luck, and when I launched an actual memory stress test all that got me was a BSOD for my trouble, so I’m pretty confident in my finding that ECC does correct and report errors.
In other news, DDR5-6000 CL30 with modestly tweaked subtimings (just basic Hynix stuff) and not-sabotaged voltages seems to be perfectly stable though:
Also did 10 hours of Prime95 large FFT’s, no errors there either.
I know memory OC isn’t really what this thread is for, but there you go, that’s an anecdote about a thing you can do with ECC sticks on AM5. I’ll definitely keep messing with this, but it’s mostly for my own interest.
Just a small update about my ASUS Proart Creator X870E, I updated to the latest bios, redid all my settings from scratch (as referenced in a recent L1T video about the board), and was able to boot my 96 gigs of ECC UDIMM memory in ECC mode at its native 5600 MT/s.
Memory training time was significantly reduced as well even in comparison to 5400 speed before this update, however it was still about 20 seconds to a minute, which some might see as unreasonably long. With Memory Context Restore enabled, this problem of course largely
goes away after the first boot with it enabled.
That being said, at this time, I do not believe they officially support, through the QVL any ECC UDIMMS at a native 5600 for 2 sticks.
Its a big upgrade though as I no longer fear having the bios reset to stock (as prior I’d have to remove a stick for it to boot rather than training forever). So in essence, just a report that it now works close to as expected at 5600.
Strange and interesting because I had trouble getting the board to even boot at its native 5600 previously, but with this new bios maybe I’ll try to trigger errors again.
Can also confirm the new ProArt X870E BIOS improved my 4 stick timing. I’m able to significantly boost the speed to 4800 and pass memtest without raising VDD.
I’m using 4x nemix sticks. You can reference Jan 20 post above. Originally I got it working and passing memtest with 1.20 @ 4200. However a week later the system hanged after getting warm doing GPU AI stuff so I ratchet it back down to a safe 3600. Now I’m going for 4800 again to see if it’s really stable. No issue with memtest and prime95 so far.
Here is the timings for the Nemix DDR5-5600 kit with ProArtX670 board AEMP timings applied.
Screenshot from corefreqd utility which is the closest thing to Zentimings on Linux I’ve found.
I’m thinking about boosting to 1.2V. Currently just stepping back the speed @1.1V every time the system freezes. Now down to 4000 but the interval between freezes gets longer now so it’ll take a while for me to get started on 1.2V.
On the surface the new BIOS seems to train better but it’s not stable. Raising to 1.2V doesn’t help much if at all. Every few days it’ll freeze and I dial down a bit until I’m basically back at 3600.
Now I can (mostly) answer for the X870E Taichi Lite. 9600X CPU, two Kingston KSM48E40BS8KI-16HA sticks, Debian 12 with tweaked 6.12.12 kernel.
Got a C5 error in either dual-channel config, booted fine in either single-channel config. Upgraded BIOS from 3.15 to 3.20, now dual-channel works, but only in A2/B2 (otherwise C5 again). Both are running at 4800.
It does seem to detect ECC. I don’t feel like buying Memtest86 Pro to inject and test, so I’ll leave it there for now. Tempting tho.
sudo edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
mc0: csrow2: 0 Uncorrected Errors
mc0: csrow2: mc#0csrow#2channel#0: 0 Corrected Errors
edac-util: No errors to report.
sudo dmidecode --type memory | grep "Error Correction\|Width\|Speed"
Error Correction Type: Multi-bit ECC
Total Width: Unknown
Data Width: Unknown
Total Width: 72 bits
Data Width: 64 bits
Speed: 4800 MT/s
Configured Memory Speed: 4800 MT/s
Total Width: Unknown
Data Width: Unknown
Total Width: 72 bits
Data Width: 64 bits
Speed: 4800 MT/s
Configured Memory Speed: 4800 MT/s
You could try overclocking the memory, undervolting the memory, or lowering the primary timings to generate detected ECC errors. Just takes an hour of playing in the BIOS to see if you can cause detected errors.
A good help for this methodology which I agree with is getting the shittiest/oldest/slowest standard ECC modules. Overclocking those to generate memory errors should be easy and you can be sure that the motherboard’s memory traces or the CPU’s memory controller aren’t the source of the errors, then.