Interested in IOMMU & ECC on the Asus ROG Zenith Extreme TR board

I finally collected my threadripper workstation.

  • 1950x
  • aorus x399 bios @ F2
  • 8x16gb udimm ecc kvr24e17d8/16
    I d like to confirm all dimms are recognised and working. Also multibit ecc is recognised.
3 Likes

Nice! Is it this Kingston RAM you’re using http://amzn.to/2CFRbmD

Mine doesnt have last two letters in its serial. But i guess they are identical.

1 Like

Could you explain how you went about to verify ECC?

Try this: dmidecode -t memory

credit:

I have 64 GB ECC (4x16) running on the designare x399 board.

To check ECC is actually detected and working in Linux, check your “dmesg” output for EDAC (Error Detection and Correction):

root@deepthought:~# dmesg | grep -i edac
[ 0.294097] EDAC MC: Ver: 3.0.0
[ 18.008325] EDAC amd64: Node 0: DRAM ECC enabled.
[ 18.008326] EDAC amd64: F17h detected (node 0).
[ 18.008365] EDAC MC: UMC0 chip selects:
[ 18.008366] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 18.008367] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 18.008368] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 18.008369] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 18.008372] EDAC MC: UMC1 chip selects:
[ 18.008372] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 18.008373] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 18.008374] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 18.008375] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 18.008376] EDAC amd64: using x8 syndromes.
[ 18.008376] EDAC amd64: MCT channel count: 2
[ 18.008509] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
[ 18.008518] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[ 18.008519] AMD64 EDAC driver v3.5.0

This is from a Ryzen system, but should be similar on TR.

If there’s no mention of EDAC, check the EDAC kernel module (amd64_edac_mod) is loaded, and if not load it with:

modprobe amd64_edac_mod

I tried this on a system with an Intel Xeon and a C232 chipset once and it didn’t work. In fact I have tried all other methods mentioned in that post before and they all didn’t show ECC working. What did work in the end was the commercial fork of memtest86 from PassMark, which indicated ECC as working as I was expecting since I met all necessary criteria.

I might test the Linux kernel messages at one point when I have some spare time on that machine.

Thanks for the tip! I didn’t know this was a module and not built-in, but I will keep that in mind. :smiley:

Btw: what Ryzen motherboard are you using?
Oh and is that one or two 16GB modules? It’s not quite clear to me from the log.

It may depend on which kernel version you’re running. I think it used to be built-in but can now be a module. In 4.14 it can definitely be a module.

It’s an ASRock x370 Taichi with 32GB of ECC RAM (2 x 16GB sticks - Crucial kit CT2K16G4WFD824A).

1 Like

Here’s my EDAC output on a designare x399/1950x with 4x16 GB ECC. I assume nodes 0 and 1 are for the two memory controllers, but am not certain.

dmesg | grep -i edac
[ 0.203002] EDAC MC: Ver: 3.0.0
[ 6.867080] EDAC amd64: Node 0: DRAM ECC enabled.
[ 6.867083] EDAC amd64: F17h detected (node 0).
[ 6.867127] EDAC MC: UMC0 chip selects:
[ 6.867128] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 6.867129] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 6.867130] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 6.867131] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 6.867133] EDAC MC: UMC1 chip selects:
[ 6.867134] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 6.867134] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 6.867135] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 6.867135] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 6.867135] EDAC amd64: using x8 syndromes.
[ 6.867136] EDAC amd64: MCT channel count: 2
[ 6.877192] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
[ 6.877202] EDAC amd64: Node 1: DRAM ECC enabled.
[ 6.877203] EDAC amd64: F17h detected (node 1).
[ 6.877245] EDAC MC: UMC0 chip selects:
[ 6.877246] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 6.877247] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 6.877247] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 6.877248] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 6.877250] EDAC MC: UMC1 chip selects:
[ 6.877251] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 6.877251] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 6.877252] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 6.877252] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 6.877252] EDAC amd64: using x8 syndromes.
[ 6.877253] EDAC amd64: MCT channel count: 2
[ 6.879698] EDAC MC1: Giving out device to module amd64_edac controller F17h: DEV 0000:00:19.3 (INTERRUPT)
[ 6.879894] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[ 6.879895] AMD64 EDAC driver v3.5.0

3 Likes

Wow awesome, thanks for the edac tip…

My current FreeNAS vdev is at 84% and only 80% utilization is recommended - so I would most likely end up ordering another TR chip since I have a Zenith Extreme (Spare) in a box. I will also grab some ECC ram for the jobbie.

Need to setup a FreeNAS replication box, and “backup” off my primary before I can move the primary into a better enclosure and/or expand existing storage.

CC @SgtAwesomesauce

Progress? I sense progress!

1 Like

Wanted to chime in here since this is a high ranking google result for ASUS Zenith ECC ram.

I have an ASUS Zenith and am using ECC with success. ASUS has zero ECC dimms on their qualified vendor list. Crucial’s website (not samsung b die i know :frowning:) has a compatibility tool. Took a chance and used that and my 32 Gb of udimm 2666 ram works :slight_smile:

3 Likes

Thanks for the update. Glad to hear a success story!

@gnusense nice! congrats, what’s the Crucial RAM part number BTW?

part # is CT10739055. Not an ECC expert but I see DPC error messsages in dmesg so hoping ECC is functional.

Will try to become more ECC educated after my semester ends :slight_smile:

This tool came up on a search for ASUS Zenith compatible ECC. Pretty neat tool. You give it your motherboard and then can choose memory features like speed, ECC or not, etc. Tool is from crucial so only crucial/micron memory. Wish more manufacturers took the time do such testing and had similar tools.

1 Like

I will add however that I did run into some memory-related POST issues at first. For whatever reason simply changing the dimm slots thats my sticks were in fixed it.

I will also add that the dimm.2 card that comes with Zenith is touchy for me. I get memory POST errors when the card is seated and locked in correctly. I discovered that the only way to use it is to actually not push it in all the way, weirdly enough. Works fine now :man_shrugging:

For trouble shooting memory errors on this specific board I recommend removing dimm.2 card.

Overall dimm.2 is a cool idea, makes m.2 drives easier to swap for sure. Needs work though. If I could do it all over again I would probably listen to Phoronix and have gotten a gigabyte x399, or better yet have just shelled out a little more for a supermicro/Epyc build.

I actually regret picking up the Gigabyte board, there hasn’t been a new UEFI since the initial release; they basically stopped supporting their flagship Aorus X399 Gaming board. It’s crazy!!

2 Likes

ASUS has released new BIOS but their support is still mostly a runaround, lots of emails about moving the issues up, no follow ups. Sorry about Gigabyte’s lack of new BIOS though :expressionless:, its shameful how even the flagships of consumer hardware are lacking decent support.

2 Likes

Same here with my Z97 board. I don’t mess with the UEFI that much but when I do, the board just annoys me. Not just the UI but also the logic behind the scenes.
I was always hoping for a update in the two years I had it, but the board is still on pretty much the same version as it originally shipped when it came out.
My next board will definitely be from Asus, from what I can tell so far.

Leaving a tip for troubleshooting ECC / weirdly spec’d normal memory in TR4 here.

After a bios update recently, my UEFI settings were reset, and my computer no longer booted due to memory errors.

Fix was to boot with only two sticks (or whatever you can boot with), up the volts for both channels, then install the other two sticks and reboot.

I bought a kit of 4 crucial ECC sticks, two defaulted to 1.220v and the other to 1.190 volts (despite that both were 1.20 on the packaging). The 1.19 volt channel default caused the issue. I went up to 1.28 in both, because I used to OC it and play with the timings, and this voltage was fine.

Also, if you have a zenith, remove DIMM.2 for troubleshooting any memory errors.

1 Like