Interested in IOMMU & ECC on the Asus ROG Zenith Extreme TR board

For someone who is as paranoid about a flipped bit as me, ECC is necessary in all systems. :wink:

But strictly technically speaking, you are not required to have working ECC memory for IOMMU.

Thing is @comfreak I share your sentiments on flipped-bits as I plan to setup my TR system as my core-dev box.

In the event I were to run some VMs for ‘server duty’, ECC would be handy. Heck, one task I have is to setup a ZFS replication box. Rather than buying the hardware for yet another FreeNAS build, I could potentially VM/IOMMU an HBA controller; if I don’t go with ECC RAM, well…, I’m not so sure I’d be able to trust my data there.

Then again, I would value performance boost from the faster RAM; Guess I’ll go non-ECC now and bite the bullet in setting up the Replication box.

I’ll most likely replicate the components I’ve used in my primary FreeNAS build.

1 Like

If you do go that route, please let me/us know. I am planning on trying that out too at one point. :wink:

1 Like

I did it with xen for years it was fine and had no problem importing the pool into it’s current home on bare metal hardware a Xeon e3

2 Likes

Do you think the same could be possible using something like Proxmox?

As long as the zfs version/features is not going from newer>older I don’t see why not

2 Likes

Despite of wendell’s video I contacted MSI and asked about ECC function and recieved:
“Regarding your concern,we are sorry,do you mean the motherboard X399 GAMING carbon ac?we are sorry,this motherboard does not support ECC Function.”

1 Like

I can confirm it does work, it may just not be validated in the way it would be validated on e…g. a workstation board… You may want to wait for someone to make a workstation board if you want validation, otherwise it works fine.

On the spec page it says ecc udimm supported. Ecc rdimm is not supported for sure , but ecc udimms are. Single bit errors are reported. Dual bit errors should theoretically lock the system but there may be a kernel option you set to enable that.

2 Likes

Maybe it works for a number of ECC rams and they do not want the headache with all other ECCs and be over with by calling it “non-functioning”. … maybe…
Edit: the website no longer claims ECC function either.

Most of the time this “not validated” stuff means that the behavior when there is a two bit errors is undefined. You will get kernel messages or windows event log entries for both single and dual bit errors. Ideally the machine hangs on a two bit errors

2 Likes

I finally collected my threadripper workstation.

  • 1950x
  • aorus x399 bios @ F2
  • 8x16gb udimm ecc kvr24e17d8/16
    I d like to confirm all dimms are recognised and working. Also multibit ecc is recognised.
3 Likes

Nice! Is it this Kingston RAM you’re using http://amzn.to/2CFRbmD

Mine doesnt have last two letters in its serial. But i guess they are identical.

1 Like

Could you explain how you went about to verify ECC?

Try this: dmidecode -t memory

credit:

I have 64 GB ECC (4x16) running on the designare x399 board.

To check ECC is actually detected and working in Linux, check your “dmesg” output for EDAC (Error Detection and Correction):

root@deepthought:~# dmesg | grep -i edac
[ 0.294097] EDAC MC: Ver: 3.0.0
[ 18.008325] EDAC amd64: Node 0: DRAM ECC enabled.
[ 18.008326] EDAC amd64: F17h detected (node 0).
[ 18.008365] EDAC MC: UMC0 chip selects:
[ 18.008366] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 18.008367] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 18.008368] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 18.008369] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 18.008372] EDAC MC: UMC1 chip selects:
[ 18.008372] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 18.008373] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 18.008374] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 18.008375] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 18.008376] EDAC amd64: using x8 syndromes.
[ 18.008376] EDAC amd64: MCT channel count: 2
[ 18.008509] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
[ 18.008518] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[ 18.008519] AMD64 EDAC driver v3.5.0

This is from a Ryzen system, but should be similar on TR.

If there’s no mention of EDAC, check the EDAC kernel module (amd64_edac_mod) is loaded, and if not load it with:

modprobe amd64_edac_mod

I tried this on a system with an Intel Xeon and a C232 chipset once and it didn’t work. In fact I have tried all other methods mentioned in that post before and they all didn’t show ECC working. What did work in the end was the commercial fork of memtest86 from PassMark, which indicated ECC as working as I was expecting since I met all necessary criteria.

I might test the Linux kernel messages at one point when I have some spare time on that machine.

Thanks for the tip! I didn’t know this was a module and not built-in, but I will keep that in mind. :smiley:

Btw: what Ryzen motherboard are you using?
Oh and is that one or two 16GB modules? It’s not quite clear to me from the log.

It may depend on which kernel version you’re running. I think it used to be built-in but can now be a module. In 4.14 it can definitely be a module.

It’s an ASRock x370 Taichi with 32GB of ECC RAM (2 x 16GB sticks - Crucial kit CT2K16G4WFD824A).

1 Like

Here’s my EDAC output on a designare x399/1950x with 4x16 GB ECC. I assume nodes 0 and 1 are for the two memory controllers, but am not certain.

dmesg | grep -i edac
[ 0.203002] EDAC MC: Ver: 3.0.0
[ 6.867080] EDAC amd64: Node 0: DRAM ECC enabled.
[ 6.867083] EDAC amd64: F17h detected (node 0).
[ 6.867127] EDAC MC: UMC0 chip selects:
[ 6.867128] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 6.867129] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 6.867130] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 6.867131] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 6.867133] EDAC MC: UMC1 chip selects:
[ 6.867134] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 6.867134] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 6.867135] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 6.867135] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 6.867135] EDAC amd64: using x8 syndromes.
[ 6.867136] EDAC amd64: MCT channel count: 2
[ 6.877192] EDAC MC0: Giving out device to module amd64_edac controller F17h: DEV 0000:00:18.3 (INTERRUPT)
[ 6.877202] EDAC amd64: Node 1: DRAM ECC enabled.
[ 6.877203] EDAC amd64: F17h detected (node 1).
[ 6.877245] EDAC MC: UMC0 chip selects:
[ 6.877246] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 6.877247] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 6.877247] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 6.877248] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 6.877250] EDAC MC: UMC1 chip selects:
[ 6.877251] EDAC amd64: MC: 0: 0MB 1: 0MB
[ 6.877251] EDAC amd64: MC: 2: 16383MB 3: 16383MB
[ 6.877252] EDAC amd64: MC: 4: 0MB 5: 0MB
[ 6.877252] EDAC amd64: MC: 6: 0MB 7: 0MB
[ 6.877252] EDAC amd64: using x8 syndromes.
[ 6.877253] EDAC amd64: MCT channel count: 2
[ 6.879698] EDAC MC1: Giving out device to module amd64_edac controller F17h: DEV 0000:00:19.3 (INTERRUPT)
[ 6.879894] EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.0 (POLLED)
[ 6.879895] AMD64 EDAC driver v3.5.0

3 Likes

Wow awesome, thanks for the edac tip…

My current FreeNAS vdev is at 84% and only 80% utilization is recommended - so I would most likely end up ordering another TR chip since I have a Zenith Extreme (Spare) in a box. I will also grab some ECC ram for the jobbie.

Need to setup a FreeNAS replication box, and “backup” off my primary before I can move the primary into a better enclosure and/or expand existing storage.

CC @SgtAwesomesauce