Badly need a stable Linux system with ecc?

my car is worth 500$ when the gas tank is full. ‘affordable’ is a moving target.

2 Likes

I’m using Ubuntu linux 20.04 - 22.10. In general you can use:
dmidecode --type 16
Outout should show:
Error Correction Type: Multi-bit ECC
And:
dmidecode --type 17
Output should show:
Total Width: 72 bits
Data Width: 64 bits
Where Total width is 8 bits more than Data (ECC bits).
These will show if system does have ecc in use.

To see actual detected errors & corrections one would run:
edac-util -v
Which will show diagnostics.

Additional info:
How to Check ECC RAM Functionality
edac-util, Man page
Hardware Canucks - ECC Memory & AMD’s Ryzen – A Deep Dive

4 Likes

I really like the “cheap build” with ddr3, I do have another ageing ddr2 NAS that eventually needs replacing with ecc which I can use this for, amazing as it was going to be another question I was going to ask once I had some hands on with ecc as I am still learning from you guys, thanks again.

But the second build is the one that is right where I want to be, within budget and new hardware for a piece of mind, hopefully this goes on working for well over 5years lol.
The prices are so different on ebay than it is on Alibaba with the Supermicro H11ssl “kits” So need to study that a bit more, or even find some supermicro distro in UK.

May I also ask two more things please,
I have never used Epyc so is it like am4 where some don’t work on some motherboard/bios or are they all plug and play On their respective sockets? Any recommendations for the h11ssl-I basically?

Also what ram should I go for in terms of reliability and endurance, availability in shops lol?

Thank you so much for your time.

So excited to do this for myself as soon as I get the hardware in hand.
Im also going to run Ubuntu, it’s such a well supported distro, has tons of documents supported by community’s much like this one, makes it easy for us none I.T. Pro’s to learn things lol.
I’m so grateful to have found this place also
Thanks so much again.

May I ask if you have tried injecting errors to your ram to really see it work?

1 Like

I haven’t myself tried that. On the Hardware Canucks link they do that though. On consumer mainboards one can overclock RAM and that way start to see errors.

Also I’m planning once there is more time I would investigate ECC memory scrubbing. But currently i’m running stock without it.

Also do notice if you choose to go consumer grade hardware on AMD side APU:s do not support ECC. Well PRO versions do but they are harder to get.

1 Like

So, why do you say AM4 doesn’t fully support ECC?

dmidecode 3.3

Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.

Handle 0x0018, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 128 GB
Error Information Handle: 0x0017
Number Of Devices: 4

Does not Linux here state that it sees multi-bit ECC memory?

1 Like

Good point, i would rather have the consumer platform as its quieter, more portable, and easier to repair in the future maybe even more units out there in people’s hands

Yes I do appreciate the links you attached, will certainly study them when I get the chance in way more detail, thank you so much.

Would love to get the experience testing ram and having feedback, I probably would not overclock it, but your point about being able to push it to the point of getting errors is great for gamers, I would love to see ddr5 ECC lol.

I forgot to ask, what ram was recommended for your motherboard?

EPYC has the ‘must pay attention to board and bios revision’ issues just like AM4 unfortunately. an H11SSL V2 will work with any 7xx2 CPU out of the box, and a 7252 is a good bang for the buck CPU.

on EPYC and OPTERON i have had pieces of RAM go bad and report errors that were logged and i was able to track and replace the offending RAM stick.

1 Like

the issue with ECC on AM4 is the REPORTING part. and it is variable even based on BIOS REVISION on some mainboards. The pc and OS will list ECC is fully functional but no logs will ever be generated as the BIOS does not always have logging capabilities, so you can run with errors being generated but never know about it. it is not always the case though, it is board and bios revision specific.

for me it is not worth the ongoing headache of wondering if a bios revision will kill ECC reporting or not.

1 Like

Asus recommendation is Kingston KSM26ED8/16ME. There is memory qualification lists on Asus site for each mainboard & CPU where that memory is listed (with ECC marking).

I plan to do some testing with overclocking to see if I can see errors & corrections. But It will be somewhere in the future. It’s funny how much stuff there is to do and list just keeps getting bigger :slight_smile:

On a side note there is Threadripper Pro (on Amd side) which supports ECC. It’s more workstation than server. In case you do not want to go full Epyc. But I can’t really comment on that as I haven’t used or even researched that much. Just to say that there are options.

1 Like

Do you have more information on this. What brands / boards in particular, any links & sources to share?

1 Like

the internet consensus is:
ASRock and Gigabyte = GOOD
ASUS = MEH
MSI = don’t bother.

a lot of other manufacturers seem to also be pretty random and rather undocumented. Users will report entire new menu sections showing up in BIOS with ECC RAM installed, but no mention of it at all in any actual documentation.

i personally was at the forefront of ECC on AM3 platforms and actually helped make some board lists for that platform. i have owned and done a handful of ECC on AM4 builds and i just stuck to the ASRock or nothing mentality. (unless you test it yourself or see someone that has tested that board, i do not think i would call it a ‘for sure’ though)

if memory serves, it was an early MSI AM4 board that reported in linux that ECC was present, but would never generate any logging info no matter what was tried on it. Bios can say a system is ECC CAPABLE without it being ON. we think that is what was going on.

Also, i believe @wendell did a piece about a gigabyte board that silently corrected errors while not reporting any logging info. i can’t say it is or is not a specific issue to gigabyte, or a certain chipset, but i have seen some AM4 boards that did show reporting info.

2 Likes

I’m still looking for the supermicro motherboard, the rev 2 is labelled H11SSL-NC I think, but all the ones available are H11SSL-I. And even then mostly second hand or not UK stores meaning pain in the ars warranty.

so with that in mind what would be the recommended asrock/gigabyte motherboard with ecc Linux reporting amd/Intel with most proven track record in your opinion?

This article explains the reason why you really want to use ECC memory, on an engineering level:

DRAM’s Damning Defects—and How They Cripple Computers An investigation into dynamic random-access memory chip failure reveals surprising hardware vulnerabilities

You need to be an IEEE member to read it… but, there are other articles out there. :wink:

1 Like

no the difference between an NC and I board is separate from the difference between Rev1 and REV2 boards. the NC board has an onboard 8 port SAS controller, the I board does not.

both are available in REV1 and REV2 flavors.

as far as ASRock goes, ‘the one that is in stock and or on sale’ is the one to get. seriously tech purchases across the globe are not what they were 5 years ago. you probably need to stop looking for a specific item and start seeing what you can get that you can make due with.

1 Like

You have been very helpful and I thank you for your knowledge.
Will add to my filters for sure.

Isn’t ram way more durable as a storage medium in terms of how many read and writes it gives you in its lifetime?

Hence why we use it for steam caching and what not

So I would be looking to underclock and use it more to make my sad last much longer, only write data if there is power issues down the line somewhere or something set off an alarm lol

all TR CPUs support ECC. ThreadRipper PRO can support ECC REG memory also. (can support, and MB supports it, might be different.)

1 Like

So what would be a good current day motherboard that gives me ecc reporting or whatever I can buy in the shops brand new today lol?

Everything new other than cpu, cpu I don’t care, any good performer (as gross as it is I could even accept Intel into my home). Server, enthusiast, or gaming motherboards or ram, I just want my ram reporting back to me :pray:, appreciate any help, thanks again