Badly need a stable Linux system with ecc?

Hi I want to build a pc where I can store and edit priceless photos and videos, I need ecc as it’s going to be an always on machine and a backup point also, and I run Linux so AMD preferred?

If anyone could suggest a full build with a high performance am4, ddr4 ecc system I would be greatful, also would love the cheapest build with new hardware in mind.

Thanks in advance.

this is a counter-intuitive project. even an ASrock Rack AM4 board with ECC non-reg RAM is limited in how ECC functions.

you say high performance so will you be adding VMs or is this JUST a NAS?

what will your backup plan be? if it is important data, an offsite backup is recommended.

how many drives do you plan to use?

does it need to be in an ATX case or smaller? or rack mount?

1 Like

I don’t know how suitable it would be to your use case. But I have been running Asus TUF gaming X570-plus with Kingston Server Premier ECC RAMs (KSM26ED8/16HD). I haven’t had any issues and everything seems to be good on ECC side (diagnostics wise haven’t seen any actual corrections / erros. But they should be rare). I would choose CPU as you see fit, bigger the better. I have been running 3700. Which is seldom maxed out in my use. But then again I don’t do much CPU expensive tasks.

PS. RAM I am using is not the one is recommended by ASUS as the recommended version is hard to find.

1 Like

So it will run vm but max 2, and basic things(pi hole, Linux desktop vm remote, etc), its most critical purpose would be to add to an ageing backup system, as the backups I have already is on very old failing hardware, and the only ecc feature I’m interested in is parity data chip where it checks when it is handing the data I guess? I mean zfs handles any bits that drifted on the drives right? Sorry that’s the bit I’m lost in with what works best in term of drives, ease of use, is this impossible for less/around £1-3k?

I will not need more than two or three nvme drives 2-4tb each, maybe four sata drives max 16-20tb each.

Sorry for leaving these out, thanks again for your time.

One might say as you not finding any errors on your ecc memory would means it was not worth it, but the fact that your memory tells you if it’s finding errors or not just blows my mind and it’s a feature that I would love to play with.

How do you check for the ecc errors on your particular system and what OS do you use?

And thank you for your amazing recommendation, your cpu specs is good enough for me too, the motherboard seems available where I am also which is a bonus these days lol

so that is exactly the issue with AM4 ECC, it has no ability to show reporting data to an OS. in THEORY you can use IPMI on a ASrock Rack server board to view this data but i heard no actual verification reports of rather this works.

i have 2 recommendations and you can pick based on budget.

option 1. uber low cost tank build.
an opteron H8SGL supermicro board with a 6380, or 6366HE CPU. this uses real ECC REG DDR3 RAM, can be viewd by OS utilities, has lots of cores, makes for a good VM host and has just enough PCIE to get some HBAs and a 10gb nic. under 500$. this stuff is old, but i am still running this exact setup as a NAS at a location and it is bulletproof. its uptime is measured in YEARS.

option 2. all it takes is money
EPYC stuff. used or new if you want. same hardware benefits as the opteron rig, you get working ECC DDR4 RAM, more and faster PCIE, newer CPU instruction sets, but more expensive and equal or more power draw. an H11ssl and a EPYC 7252 are about as low cost as you will get and you will easily eat 1500$ trying to complete this build.

1 Like

I think Zedicus had a lot of good points.

I am a real fan of AMD

But Intel has a lot more older kit that can mean complete systems, with extra backups for the same price as new hardware.

Obviously new would be more energy efficient, and future proof.

But stability and continuity/durability is the goal, so a rotating backup structure would be more important in my mind.

ECC on a ZFS system would mean the first write to drives should be much less corrupt, then when on drives, the systems can check for corruption reliably.

I am not sure on prices and stuff of machines, but would suggest a system with extra capacity of drives so you don’t have all eggs in one super-good basket, because all systems eventually fail?
Just my $0.02
And used epics might be really affordable where OP is?

2 Likes

my car is worth 500$ when the gas tank is full. ‘affordable’ is a moving target.

2 Likes

I’m using Ubuntu linux 20.04 - 22.10. In general you can use:
dmidecode --type 16
Outout should show:
Error Correction Type: Multi-bit ECC
And:
dmidecode --type 17
Output should show:
Total Width: 72 bits
Data Width: 64 bits
Where Total width is 8 bits more than Data (ECC bits).
These will show if system does have ecc in use.

To see actual detected errors & corrections one would run:
edac-util -v
Which will show diagnostics.

Additional info:
How to Check ECC RAM Functionality
edac-util, Man page
Hardware Canucks - ECC Memory & AMD’s Ryzen – A Deep Dive

4 Likes

I really like the “cheap build” with ddr3, I do have another ageing ddr2 NAS that eventually needs replacing with ecc which I can use this for, amazing as it was going to be another question I was going to ask once I had some hands on with ecc as I am still learning from you guys, thanks again.

But the second build is the one that is right where I want to be, within budget and new hardware for a piece of mind, hopefully this goes on working for well over 5years lol.
The prices are so different on ebay than it is on Alibaba with the Supermicro H11ssl “kits” So need to study that a bit more, or even find some supermicro distro in UK.

May I also ask two more things please,
I have never used Epyc so is it like am4 where some don’t work on some motherboard/bios or are they all plug and play On their respective sockets? Any recommendations for the h11ssl-I basically?

Also what ram should I go for in terms of reliability and endurance, availability in shops lol?

Thank you so much for your time.

So excited to do this for myself as soon as I get the hardware in hand.
Im also going to run Ubuntu, it’s such a well supported distro, has tons of documents supported by community’s much like this one, makes it easy for us none I.T. Pro’s to learn things lol.
I’m so grateful to have found this place also
Thanks so much again.

May I ask if you have tried injecting errors to your ram to really see it work?

1 Like

I haven’t myself tried that. On the Hardware Canucks link they do that though. On consumer mainboards one can overclock RAM and that way start to see errors.

Also I’m planning once there is more time I would investigate ECC memory scrubbing. But currently i’m running stock without it.

Also do notice if you choose to go consumer grade hardware on AMD side APU:s do not support ECC. Well PRO versions do but they are harder to get.

So, why do you say AM4 doesn’t fully support ECC?

dmidecode 3.3

Getting SMBIOS data from sysfs.
SMBIOS 3.3.0 present.

Handle 0x0018, DMI type 16, 23 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: Multi-bit ECC
Maximum Capacity: 128 GB
Error Information Handle: 0x0017
Number Of Devices: 4

Does not Linux here state that it sees multi-bit ECC memory?

Good point, i would rather have the consumer platform as its quieter, more portable, and easier to repair in the future maybe even more units out there in people’s hands

Yes I do appreciate the links you attached, will certainly study them when I get the chance in way more detail, thank you so much.

Would love to get the experience testing ram and having feedback, I probably would not overclock it, but your point about being able to push it to the point of getting errors is great for gamers, I would love to see ddr5 ECC lol.

I forgot to ask, what ram was recommended for your motherboard?

EPYC has the ‘must pay attention to board and bios revision’ issues just like AM4 unfortunately. an H11SSL V2 will work with any 7xx2 CPU out of the box, and a 7252 is a good bang for the buck CPU.

on EPYC and OPTERON i have had pieces of RAM go bad and report errors that were logged and i was able to track and replace the offending RAM stick.

the issue with ECC on AM4 is the REPORTING part. and it is variable even based on BIOS REVISION on some mainboards. The pc and OS will list ECC is fully functional but no logs will ever be generated as the BIOS does not always have logging capabilities, so you can run with errors being generated but never know about it. it is not always the case though, it is board and bios revision specific.

for me it is not worth the ongoing headache of wondering if a bios revision will kill ECC reporting or not.

Asus recommendation is Kingston KSM26ED8/16ME. There is memory qualification lists on Asus site for each mainboard & CPU where that memory is listed (with ECC marking).

I plan to do some testing with overclocking to see if I can see errors & corrections. But It will be somewhere in the future. It’s funny how much stuff there is to do and list just keeps getting bigger :slight_smile:

On a side note there is Threadripper Pro (on Amd side) which supports ECC. It’s more workstation than server. In case you do not want to go full Epyc. But I can’t really comment on that as I haven’t used or even researched that much. Just to say that there are options.

Do you have more information on this. What brands / boards in particular, any links & sources to share?

the internet consensus is:
ASRock and Gigabyte = GOOD
ASUS = MEH
MSI = don’t bother.

a lot of other manufacturers seem to also be pretty random and rather undocumented. Users will report entire new menu sections showing up in BIOS with ECC RAM installed, but no mention of it at all in any actual documentation.

i personally was at the forefront of ECC on AM3 platforms and actually helped make some board lists for that platform. i have owned and done a handful of ECC on AM4 builds and i just stuck to the ASRock or nothing mentality. (unless you test it yourself or see someone that has tested that board, i do not think i would call it a ‘for sure’ though)

if memory serves, it was an early MSI AM4 board that reported in linux that ECC was present, but would never generate any logging info no matter what was tried on it. Bios can say a system is ECC CAPABLE without it being ON. we think that is what was going on.

Also, i believe @wendell did a piece about a gigabyte board that silently corrected errors while not reporting any logging info. i can’t say it is or is not a specific issue to gigabyte, or a certain chipset, but i have seen some AM4 boards that did show reporting info.

1 Like