X870E ECC support

Follow-up to this: I think I’ve managed to prove to my satisfaction that ECC is working and that corrected errors get reported via WHEA. I found some guy on Reddit who figured out a simple way to get correctable errors - set the frequency as high as you can while still getting into Windows, then lower VDDIO and VDDQ significantly and eventually you’ll start getting WHEA warnings about correctable memory errors. And indeed, with ECC enabled and VDDIO/VDDQ set to a fairly ridiculous 0.9V, I get the expected WHEA events basically as soon as I get into Windows and start Zentimings and the event viewer:

The system is very unstable like this, and I can’t recommend it for the faint of heart, but as crude as it is, it does seem to prove the ECC reporting works. I tried rebooting with these settings a few times and got the exact same result every time. Just to further drive the point home I disabled ECC in UEFI and tried the same thing yet again, but as expected I got no WHEA events then. I messed around a bit trying to provoke one but with no luck, and when I launched an actual memory stress test all that got me was a BSOD for my trouble, so I’m pretty confident in my finding that ECC does correct and report errors.

In other news, DDR5-6000 CL30 with modestly tweaked subtimings (just basic Hynix stuff) and not-sabotaged voltages seems to be perfectly stable though:

Also did 10 hours of Prime95 large FFT’s, no errors there either.

I know memory OC isn’t really what this thread is for, but there you go, that’s an anecdote about a thing you can do with ECC sticks on AM5. I’ll definitely keep messing with this, but it’s mostly for my own interest.

8 Likes

I wonder if we at some point will see ECC CUDIMMs. Seems like these two technologies were made for each other.

1 Like

Just a small update about my ASUS Proart Creator X870E, I updated to the latest bios, redid all my settings from scratch (as referenced in a recent L1T video about the board), and was able to boot my 96 gigs of ECC UDIMM memory in ECC mode at its native 5600 MT/s.

Memory training time was significantly reduced as well even in comparison to 5400 speed before this update, however it was still about 20 seconds to a minute, which some might see as unreasonably long. With Memory Context Restore enabled, this problem of course largely
goes away after the first boot with it enabled.

That being said, at this time, I do not believe they officially support, through the QVL any ECC UDIMMS at a native 5600 for 2 sticks.

Its a big upgrade though as I no longer fear having the bios reset to stock (as prior I’d have to remove a stick for it to boot rather than training forever). So in essence, just a report that it now works close to as expected at 5600.

3 Likes

Strange and interesting because I had trouble getting the board to even boot at its native 5600 previously, but with this new bios maybe I’ll try to trigger errors again.

1 Like

Can also confirm the new ProArt X870E BIOS improved my 4 stick timing. I’m able to significantly boost the speed to 4800 and pass memtest without raising VDD.

1 Like

What RAM do you have and what did you change, just the speed itself?

1 Like

I’m using 4x nemix sticks. You can reference Jan 20 post above. Originally I got it working and passing memtest with 1.20 @ 4200. However a week later the system hanged after getting warm doing GPU AI stuff so I ratchet it back down to a safe 3600. Now I’m going for 4800 again to see if it’s really stable. No issue with memtest and prime95 so far.

1 Like

Here is the timings for the Nemix DDR5-5600 kit with ProArtX670 board AEMP timings applied.
Screenshot from corefreqd utility which is the closest thing to Zentimings on Linux I’ve found.

ProArt X870E does not pick up on the AEMP profiles in the Nemix sticks so it’s just using the SPD as follows when I select the target speed:

Nemix 5600 SPD

Test results so far. I didn’t test all the speeds though since a single memtest takes like 8hrs.

Clock Speed Voltage Memtest Throughput Latency Prime95 SPD
1800 3600 1.1 44098 92.983 30-29-29-58
1900 3800 1.1 45675 90.758 32-31-31-31
2000 4000 1.1 47162 87.738 32-32-32-64
2100 4200 1.1 47040 92.506 36-34-34-68
2200 4400 1.1 48042 90.758 36-36-36-71
2300 4600 1.1 51213 89.327 40-37-37-74
2400 4800 1.1 Pass 50314 87.102 Pass 40-39-39-77
2500 5000 1.1 51692 85.83 40-40-40-80
2600 5200 1.1 Pass 62091 85.035 Fail 42-42-42-84
2700 5400 1.1 Pass 49841 87.896 46-44-44-87
2800 5600 1.1 Fail ECC 52180 86.625 46-45-45-90

Just experienced a system hang @4800 even though it passed both tests. Going to dial it back a bit.

Can you try pushing the memory voltage up to 1.25V? That is what works on my Nemix RAM via the board AEMP profiles.

I’m thinking about boosting to 1.2V. Currently just stepping back the speed @1.1V every time the system freezes. Now down to 4000 but the interval between freezes gets longer now so it’ll take a while for me to get started on 1.2V.

Are there any “consumer” ECC DDR5s, with heatspreaders - or are all in the standard “server-green, barebone” versions? :thinking:

1 Like

I haven’t seen any. There are ‘overclocking’ RDIMMS with heatspreaders for HEDT platforms though.

IMO airflow > heatspreader though. Many modules have pretty crappy heatspreaders any way.

3 Likes

On the surface the new BIOS seems to train better but it’s not stable. Raising to 1.2V doesn’t help much if at all. Every few days it’ll freeze and I dial down a bit until I’m basically back at 3600.

Just an update to this, the ProArt X870-E does have ECC support listed and has some ECC DIMMs in their QVL

Also in the Asus BIOS you need to enable ECC support since it’s disabled by default (“Auto” means disabled LOL)

1 Like

Now I can (mostly) answer for the X870E Taichi Lite. 9600X CPU, two Kingston KSM48E40BS8KI-16HA sticks, Debian 12 with tweaked 6.12.12 kernel.

Got a C5 error in either dual-channel config, booted fine in either single-channel config. Upgraded BIOS from 3.15 to 3.20, now dual-channel works, but only in A2/B2 (otherwise C5 again). Both are running at 4800.

It does seem to detect ECC. I don’t feel like buying Memtest86 Pro to inject and test, so I’ll leave it there for now. Tempting tho.

sudo edac-util -v

mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: mc#0csrow#0channel#0: 0 Corrected Errors
mc0: csrow2: 0 Uncorrected Errors
mc0: csrow2: mc#0csrow#2channel#0: 0 Corrected Errors
edac-util: No errors to report.
sudo dmidecode --type memory | grep "Error Correction\|Width\|Speed"

        Error Correction Type: Multi-bit ECC
        Total Width: Unknown
        Data Width: Unknown
        Total Width: 72 bits
        Data Width: 64 bits
        Speed: 4800 MT/s
        Configured Memory Speed: 4800 MT/s
        Total Width: Unknown
        Data Width: Unknown
        Total Width: 72 bits
        Data Width: 64 bits
        Speed: 4800 MT/s
        Configured Memory Speed: 4800 MT/s
1 Like

You could try overclocking the memory, undervolting the memory, or lowering the primary timings to generate detected ECC errors. Just takes an hour of playing in the BIOS to see if you can cause detected errors.

A good help for this methodology which I agree with is getting the shittiest/oldest/slowest standard ECC modules. Overclocking those to generate memory errors should be easy and you can be sure that the motherboard’s memory traces or the CPU’s memory controller aren’t the source of the errors, then.

1 Like

Hello guys,

sharing my findings for my B650 Asus Proart and two kits of 96 GB which are similar than the results of @NDRE28, in case it can be useful for others

Kits tested :

  • List item Nemix RAM Samsung - M324R6GA3BB0-CWM

  • List item Kingston KSM56E46BD8KM-48HM

They boot in slots B1 + B2 single-channel @ 3600 Mhz. No other config works. Returning them since dual channel can’t be passed on…

I’m kind of sad to turn back to non ECC RAM especially for ZFS workloads but it seems ECC RAM can’t boot correctly on this motherboard…

1 Like