Does memtest86+ report correctable ECC errors?

I have upgraded my workstation from 256gb DDR4 ECC RAM to 512 and my machine started to behave wonky.

It always boots successfully after changing config - so when RAM training occurs it always boots fine. But later if I just ‘hot boot’ it - it almost always power cycles once during startup. And then boots fine after power cycle. I managed to reboot it few times without issues but it’s quite rare.

I tested machine and didn’t notice anything extraordinary - sticks are running at 3200 mhz like they should so idk what’s going on. I decided to test my real use cases and while they work functionally fine, I noticed quite a bunch of ECC errors in dmesg. All of them corrected but well… kinda sus. Like 20 errors in 1 hour. I had 0 ECC errors with 256gb config.

So I decided to run memtest86+ but it just passed with 0 errors after hammering RAM for few hours. Idk what’s going on.

I tried mixing sticks in various configs:
4x64 + 4x32, another 4x64 + 4x32, just 4x64 and everything works fine until i load up all 8 banks with 8x64 (in terms of booting, i did not test for ECC errors in all possible configs)

So my question is - should memtest86+ in default configuration report ECC corrections happening? If it returns 0 errors does it mean RAM itself is fine and something else is wonky? Is it possible that just with such high memory density ECC errors are more likely to occur?

Platform is ASRock WRX80 Creator

It should. I was getting ECC corrected errors as shown here.

I don’t get it :T

Try memtest86 (non-plus). They differ in how well they support different hardware.

Edit: Or even better, see if you can find explicit mention of support for ECC error reporting for your specific platform on memtest86 and memtest86+, respectively.

1 Like

Who handles ECC corrections in KVM? I mean if I have VMs, are ECC errors supposed to show up in guest or host? I’m using hugepages.

I’ve run memtest86 with ECC enabled and it returned 0 errors:

However then I’m using machine regularly it still sometimes catches ECC error :C

So i noticed that those errors are fairly peculiar. They occur precisely every 5 minutes. And only ONE every 5 minutes.

[Mon Feb 24 21:17:04 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x62a3b27 offset:0x7c0 grain:64 syndrome:0x2222)
[Mon Feb 24 21:22:18 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x628839b offset:0xb40 grain:64 syndrome:0x2332)
[Mon Feb 24 21:27:32 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42109 offset:0xd80 grain:64 syndrome:0x2222)
[Mon Feb 24 21:32:46 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c48dd7 offset:0x700 grain:64 syndrome:0x2112)
[Mon Feb 24 21:38:00 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42eca offset:0x260 grain:64 syndrome:0x2310)
[Mon Feb 24 21:43:14 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42eca offset:0x240 grain:64 syndrome:0x2310)
[Mon Feb 24 21:48:28 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42eca offset:0x260 grain:64 syndrome:0x2210)
[Mon Feb 24 21:53:29 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x63bce5d offset:0x200 grain:64 syndrome:0x333)
[Mon Feb 24 21:58:29 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42eca offset:0xc00 grain:64 syndrome:0x2232)
[Mon Feb 24 22:03:43 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x72019b0 offset:0xc40 grain:64 syndrome:0x3322)
[Mon Feb 24 22:08:57 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x72019ac offset:0x580 grain:64 syndrome:0x2021)
[Mon Feb 24 22:14:11 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x74d82d2 offset:0xbc0 grain:64 syndrome:0x2130)
[Mon Feb 24 22:19:25 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x74d82d2 offset:0x580 grain:64 syndrome:0x3233)
[Mon Feb 24 22:24:39 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x77511f9 offset:0x640 grain:64 syndrome:0x2232)
[Mon Feb 24 22:29:53 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42eca offset:0x0 grain:64 syndrome:0x1100)
[Mon Feb 24 22:35:07 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x72019ac offset:0x580 grain:64 syndrome:0x2222)
[Mon Feb 24 22:40:21 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42109 offset:0x500 grain:64 syndrome:0x1100)
[Mon Feb 24 22:45:35 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c48e96 offset:0x380 grain:64 syndrome:0x2222)
[Mon Feb 24 22:50:36 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42109 offset:0x900 grain:64 syndrome:0x2200)
[Mon Feb 24 22:55:36 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42109 offset:0x180 grain:64 syndrome:0x223)
[Mon Feb 24 23:00:50 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x72019ac offset:0x580 grain:64 syndrome:0x2222)
[Mon Feb 24 23:06:04 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42eca offset:0x260 grain:64 syndrome:0x2232)
[Mon Feb 24 23:11:18 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42eca offset:0x0 grain:64 syndrome:0x1102)
[Mon Feb 24 23:16:32 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42109 offset:0xd80 grain:64 syndrome:0x2222)
[Mon Feb 24 23:21:46 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x654f861 offset:0xb40 grain:64 syndrome:0x10)
[Mon Feb 24 23:27:00 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x654f861 offset:0x740 grain:64 syndrome:0x2120)
[Mon Feb 24 23:32:14 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x654f861 offset:0xb80 grain:64 syndrome:0x2002)
[Mon Feb 24 23:37:28 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42109 offset:0x780 grain:64 syndrome:0x221)
[Mon Feb 24 23:42:42 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42109 offset:0x180 grain:64 syndrome:0x2222)
[Mon Feb 24 23:47:56 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42109 offset:0x180 grain:64 syndrome:0x2222)
[Mon Feb 24 23:52:57 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42109 offset:0x180 grain:64 syndrome:0x2220)
[Mon Feb 24 23:57:57 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x654f861 offset:0x900 grain:64 syndrome:0x2)
[Tue Feb 25 00:03:11 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x72019ac offset:0x580 grain:64 syndrome:0x2222)
[Tue Feb 25 00:08:25 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x72019ac offset:0x580 grain:64 syndrome:0x2222)
[Tue Feb 25 00:13:39 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x72019bc offset:0x180 grain:64 syndrome:0x132)
[Tue Feb 25 00:18:53 2025] EDAC MC0: 1 CE on mc#0csrow#1channel#1 (csrow:1 channel:1 page:0x7c42eca offset:0x260 grain:64 syndrome:0x2232)

I wonder if it’s possible to narrow down stick from those logs?

Here’s full dmesg:

I’m not an expert but those page numbers look quite similar.

Upon further investigation I came to conclusion that it seems to be one faulty DIMM. But I don’t know which DIMM maps to csrow:1, channel:1.

My educated guess is that it’s either A1 or B1 but I asked ASRock support if they could help me with figuring out mapping:

I have no experience with Asrock, but check your BMC/IPMI, you may find the information there.

I believe the UEFI might have settings for RAM scrubbing. See if it’s set to scrub the RAM every 5 minutes. That would at least explain the periodicity of the error.

(I’ve never messed with this myself but I do remember seeing this setting in a couple of UEFIs on computers with ECC RAM. I don’t know if 5 minutes is a reasonable/typical value here; maybe it’s a red herring, but worth checking out maybe.)

1 Like

I found multiple reports where people also had ECC errors exactly every 5 minutes so it seems to be value for periodic scrubs indeed. I already ordered replacement stick. Still waiting for response from ASRock though. Worst case I’ll have to try one-by-one though it’s tricky since errors only occur under higher RAM load . _ .

1 Like

My guess would be C1 is 1 since it’s the first the manual says to populate.

1 Like