Memory errors on ASUS motherboard, even though part is on compatible list?

Vordreller · January 4, 2024, 8:28pm

I’m seeing the following output of dmesg:

[ 2911.127017] EDAC MC0: 1 UE ie31200 UE on unknown memory (csrow:2 channel:1 page:0x0 offset:0x0 grain:1)
[ 2914.198994] EDAC MC0: 1 UE ie31200 UE on unknown memory (csrow:2 channel:0 page:0x0 offset:0x0 grain:1)
[ 2914.198998] EDAC MC0: 1 UE ie31200 UE on unknown memory (csrow:2 channel:1 page:0x0 offset:0x0 grain:1)
[ 2915.222971] EDAC MC0: 1 UE ie31200 UE on unknown memory (csrow:3 channel:0 page:0x0 offset:0x0 grain:1)
[ 2915.222975] EDAC MC0: 1 UE ie31200 UE on unknown memory (csrow:3 channel:1 page:0x0 offset:0x0 grain:1)
[ 2918.294915] EDAC MC0: 1 UE ie31200 UE on unknown memory (csrow:2 channel:0 page:0x0 offset:0x0 grain:1)
[ 2918.294919] EDAC MC0: 1 UE ie31200 UE on unknown memory (csrow:2 channel:1 page:0x0 offset:0x0 grain:1)
[ 2920.342872] EDAC MC0: 1 UE ie31200 UE on unknown memory (csrow:2 channel:0 page:0x0 offset:0x0 grain:1)

The system seems to be working fine, though googling it tells me this is related to ECC memory not working as it should? Is that right?

The motherboard: WS C246 PRO

Uses the latest BIOS: WS C246 PRO｜Motherboards｜ASUS Global

Version 2201

They link a list of supported RAM: WS C246 PRO｜Motherboards｜ASUS Global

Which is here: https://dlcdnets.asus.com/pub/ASUS/mb/socket1151/WS-C246-PRO/QVL/WS_C246_PRO_DIMM_AVL.pdf?model=WS%20C246%20PRO

The relevant output of lshw:

     *-memory
          description: System Memory
          physical id: 3f
          slot: System board or motherboard
          size: 32GiB
          capabilities: ecc
          configuration: errordetection=ecc
        *-bank:0
             description: [empty]
             physical id: 0
             slot: ChannelA-DIMM1
        *-bank:1
             description: DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
             product: M391A2K43BB1-CTD
             vendor: Samsung
             physical id: 1
             serial: 201B63F9
             slot: ChannelA-DIMM2
             size: 16GiB
             width: 64 bits
             clock: 2666MHz (0.4ns)
        *-bank:2
             description: [empty]
             physical id: 2
             slot: ChannelB-DIMM1
        *-bank:3
             description: DIMM DDR4 Synchronous 2666 MHz (0.4 ns)
             product: M391A2K43BB1-CTD
             vendor: Samsung
             physical id: 3
             serial: 201B16EC
             slot: ChannelB-DIMM2
             size: 16GiB
             width: 64 bits
             clock: 2666MHz (0.4ns)

Which is literally the 2nd part on the list of compatible parts.

So, I’m confused as to why the message from the logs reads:

unknown memory

Anyone have an idea what’s going on here? Running Ubuntu 20.04 on this machine.

anon7678104 · January 5, 2024, 9:19pm

theres more than a chance that between launch and you buying the ram, parts were swapped out by the manufacturer.
with the end result is the only thing that matches the spec on the qv list is the brand on the packaging.

not kidding mate. i bought 2 sets of ddr4 3000 cas 15 gmk memory… the second set i later found out had “mostly” the same timings.
but a different bitrate and were dual sided dims.
the first set, single sided. so actually physically different to the ram listed originally on the gv list for my board.
so i know from experience that the gv list is only relevant for release parts.
6 months later its a guesstimate as to whether the listed parts will even be the listed parts.

so the only thing i can recomend is try tweeking the timings to see if you can get it stable and error free.

SonWon · January 6, 2024, 11:36am

If I could I would replace the memory. I have never had problems with G.skill memory.

diizzy · January 6, 2024, 11:59am

Apart from irrelevant comments, you have some kind of memory failure. It may be faulty a memory stick, multiple and/or other hardware.
https://supportportal.juniper.net/s/article/Host-log-message-kernel-EDAC-MC0-1-CE-ie31200-CE-on-unknown-memory?language=en_US

Vordreller · January 7, 2024, 12:39am

I ran into that thread as well…

I ran mt86plus 6.20, but gave me a complete pass on all tests.

And in case of timings… hell, that seems like a like of work. And if parts on the ram were swapped out after the part originally was added to the list of useable items… that just means having to get another set of RAM, hoping I don’t run into this again.

I mean, the system seems stable. It’s not a production level server, it’s a home server that I mostly use for digital archiving, picture archiving, and gathering a bunch of prometheus data from other systems.

So, I’m kinda on the idea of just not doing anything about it.

I might just look into getting other ram, I did want this system to use ECC RAM, for the heck of it.

EDIT: memtest 7.00 came out with a few improvements, including ECC related stuff: Release v7.00 · memtest86plus/memtest86plus · GitHub

Still passes.

Vordreller · March 15, 2024, 10:07pm

So, today I decided to upgrade from ubuntu 20.04 to 22.04 on this machine.

And since then, haven’t see the error around.

Hope I’m not declaring victory too soon, but yeah, the error is gone now…

Could have been a kernel 5.4 versus 5.15 thing, maybe…

EDIT: Nope it’s back. But not right after the upgrade, for some reason.