Return to Level1Techs.com

ZFS with Zen2: overclock ECC RAM or don't use ECC?

,

Dear Community!

My primary question is the following:

What will cause me less pain when doing ZFS + dedup with Linux (Ubuntu/Debian) on Zen 2

overclocking ECC RAM or using non-ECC RAM ?

The RAM I am considering is either
2x32GB Samsung DDR4-2666, CL19-19-19, ECC or
2x32GB G.Skill RipJaws V DDR4-3600, CL18-22-22-42, (non-ECC)

I know that ECC is commonly recommended for ZFS, though there are diverging opinions.
Notwithstanding I do not want to get a speed decrease on the workstation by running it at less than 1:1 IF speeds (esp. in sight of later CPU updates).

I personally do not have enough experience to decide on that question, hence I ask for the wisdom of the community.


additional info/ background:
I am doing my PhD in Physics on MRI raw data recon. Due to CoViD I have to improvise in order to get things done despite not being able to use the university resources while being low on funding. So my home ryzen 3700X workstation has to do both data storage and a part of the really demanding calculations (I know separate systems would be the optimum but that will not be possible in the moment)

Before CoViD I stored the pseudonymized MRI (k-space raw) data I acquired redundantly on external HDD (according to the study prescription, ethics rules etc.)
After this intermediate step I should go to the workstation in the department -this is not possible in the moment (and will not be where I am for quite some time :mask:). However I am allowed to work on the (non image) raw data at home.
The data is higly redundant due to headers so a high deduplication ratio is to be expected.

But first things first - I have to consolidate the huge dataset. I organized 8x4TB WD RED from other projects and want to set up a RAIDZ2 with deduplication.
Currently I use 32GB non ECC RAM (2x16GB DDR4-3600) but that’s to small even without having the RaidZ2 when I start my scientific calculations and run a KVM simultaniously.

I do not want to get a speed decrease on the workstation by running it at less than 1:1 IF speeds (esp. in sight of later CPU updates). So my options are overclocing the ECC RAM or using non ECC memory.
Notwithstanding that data safety is very important for me as a lot of time and effort goes into working on it and my PhD depends on it (if reviewers want to see my results)

To address the elephant in the room: I plan to use L2ARC.

Here I would have a second question:
I plan to use a NVMe SSD for this. What SSD would you recommend for it? (I have two PCIe 4.0 lanes at an M.2 via the chipset)
Would the Intel Optane Memory H10 with Solid State Storage 32GB + 512GB, M.2 make sense for this as SLOG (on Optane) and L2ARC or would you recommend otherwise?

My setup: Asus WS 570 + Ryzen 3700X + Nvida 1060 FE + AMD 570 (for KVM passtrough) + HP HBA + 8x4TB WD RED (CMR :wink:)


Thank you very much for reading through this long post and giving me advice :grinning:

I wish you all health and a productive home office (if possible)

Overclocking ecc ram entirely defeats the purpose of ECC RAM

2 Likes

I disagree. The ECC continues to work fine, and as speed increases will alert you to memory errors.

Yes it becomes less reliable but at that point you back down on the speed or timings.

I think the big problem is that ECC is generally manufactured using the chips that didn’t pass the binning process for higher speeds.

2 Likes

Oh, in my opinion always use ECC when you can. My latest Ryzen build took a 10% speed hit going from 3,600 down to 2,666 ECC. But in return it completely stopped having weird random crashes in GPU DMA operations.

2 Likes

One more comment. You will never get a 2,666 ECC kit to run at 3,600. I really don’t think that will happen. So you will have to accept some slowdown if you go that way.

1 Like

I disagree here also. Over clocking the ECC ram doesn’t turn off ecc.

I have 2666 ram over clocked to 2933. I tried for 3133 but it reported errors through ecc. So I backed it down and has been rock solid for almost a year. Not a lot of gain, but it is nice to have the little bit of extra speed and still have the ecc.

2 Likes

The point is to make your ram more reliable therefore overclocking your ram which can produce errors that are not at first visible defeats the purpose of ecc, to make your system more reliable

ECC is not a free pass at all errors it has limits and best not to push those limits

3 Likes

When it comes to reliability there is no better ram to overclock than ECC.

With normal ram you can NEVER be sure you aren’t getting errors.
With ECC, you just check your logs for corrected errors. I’ve been overclocking my 2400 ECC ram to 2933 +decent timings for years now. The limits CAN be pushed, because at stock it’s well under those limits, and furthermore has the bonus to telling you when to pull back.

ECC not a sacred cow.

2 Likes

I’m not arguing wether or not it’s best ram for overclocking I’m tell you it’s not a good idea for mission critical data

3 Likes

@GigaBusterEXE But isn’t it more dangerous to use ZFS (or any filesystem) without ECC?
The rationale behind your argument would be interesting.
What would be your recommendation then?
Thank you

Just use the ECC stock
Slow data is better than wrong data

2 Likes

that would break IF 1:1 and trash my performance even more … hardly an option if you are doing a project that should run on a small cluster at home instead :mask:

Thank you all for your swift and helpful replies :smile:

I agree with @zlynx & @MtKingsnake & @Log and do not see why overclocking would defy ECC.

Maybe it’s harder to achieve a high stable overclock (due to a more complicated resistor network in unregistred DIMMs) but I do not see any obstruction against ECC.
One could even argue (as did @Log while I am writing this :wink:) that with ECC you at least see the errors if they are occurring.

This first part being finished (thanks again for your experience) there are two open questions for me:

First about the figure of merit for ZFS particularily: Would you recommend overclocked ECC over stock non-ECC for ZFS?

The second: Has anyone overclocked this particular kind of 32GB ECC Sticks from Samsung (M391A4G43MB1-CTD) — maybe in conjuction with the ASUS WS 570 and an Ryzen 3700X?
Any big caveats there?

Thank you very much for your help! :slightly_smiling_face:

No. ZFS has no real world issues with not using ECC. The danger is the exact same as any other filesystems.

Where that myth comes from is the “scrub of death”, which is a theoretical scenario which essentially requires hardware deliberately corrupting data in a special way to kill a pool. It’s not a realistic concern.
https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data/

2 Likes

@log Yes, I know that article :slightly_smiling_face:

From your experience: Is (overclocked) ECC then still worth the hassle for my mixed storage / workstation scenario?

The biggest downside of overclocking ram, is the time spent overclocking ram, but your aren’t autistically seeing how high you could go like I was.

You could very likely set your mhz to 2800 or 2933, and the timings to 16 or 18 and call it a day. That’ll get you the bulk of performance, and likely be well within the limits of the ram. Spend a day or 2 running benchmarks and stress tests and then your good. (what you really want to check for is 2 bit errors, which results in crash).
With low amount of sticks and the IMC of newer ryzens, I’ve seen people hit 3200 or more comfortably on ECC. There is a lot of wriggle room.

I have a TR 1950X, and I have 8 sticks of ECC I’ve been running for almost 3 years now. Once I dialed my overclock in, I’ve been verifiably (no corrected error logs) rock solid. I personally will never use non-ECC ram again, because I could never be sure, and I have data I want to protect. As such I recommend it, even though it’s not binned like “gaming ram” and doesn’t have that tip top performance.

Even if you were leaving it at stock. I’d go for it. Easily.

1 Like

My biggest concern is your use of deduplication. This is the number one thing everyone who uses ZFS recommends against. You may have a use case, but be careful. It’s been a long time since I’ve immersed myself in the issues so I can’t really tell you much, but I would recommended heading over to https://www.reddit.com/r/zfs/ and do a sanity check on your dataset size, estimated deduplication percentage, and the RAM size needed to deal with all that.

When using dedup, if the ram you need exceeds the ram you have, you lock yourself out of the pool until you add more.

Couple that with wanting to use l2arc, and already having issues with just 32gb or ram, that’s not looking good.

I plan to use a NVMe SSD for this. What SSD would you recommend for it? (I have two PCIe 4.0 lanes at an M.2 via the chipset)

https://www.reddit.com/r/NewMaxx/comments/bsz9it/ssd_guides_resources/ is a good resource on flash storage, though it depends on what you are using it for. I have no recommendations as far as that goes.

Would the Intel Optane Memory H10 with Solid State Storage 32GB + 512GB, M.2 make sense for this as SLOG (on Optane) and L2ARC or would you recommend otherwise?

SLOG is only useful with dealing with sync writes. A good way to test if it will be useful is to set sync=disabled (lying that write has been safely committed). If your workload speeds up, then a SLOG will help. A mirror is recommended. If the SLOG goes bad (but isn’t yet detected by ZFS), and your power goes out, then you’ve just lost data. Very very rare, but possible.

As for L2ARC, whether that will help is more tricky and I couldn’t tell you. There’s no simple either/or for L2ARC, and even looking at the cache hit/miss can be deceiving.

I would agree with this.

The performance of your system might suffer a couple percent compared to overclocked memory but if it is critical data and a lot to calculate for the CPU, a rocksolid system at all cost should be the priority.

1 Like

Dumb question… Why do you think RAM speed is the bottleneck for this image processing / data analysis? Given the file sizes and (presumably array calculations) data mining you are doing isn’t the GPU the main source of calculation delay, or IOPS on the disk?

I’d suggest ECC Dimms at slower speed are a better choice for your workloads and focus investment elsewhere.

Edit: adding…

As you increase the amount of memory in a system you typically have to back off speeds anyway for stability. Usually this means not using xmp profiles when you go above 4 sticks.

Another reason not to overpay for overclockable ram.