Overclockable DDR3 ECC?

FaunCB · August 26, 2022, 11:05pm

I am looking at getting some upgrades for my PC soon. I have determined I need to use the 10 core processor that I have, so I have resolved to upgrade to an X99 board at some point.

In tandem with that, my workload needs ECC. However, I want ECC that is faster than 1600. Is that possible? Or is there overclockable ECC? I need 1866 or 2133 with ECC.

FunnyPossum · August 26, 2022, 11:14pm

Overclocking and ECC, do they really fit into the same sentence?

ECC is is about being 100% error free and stable, and overclocking seems to be the exact opposite

Log · August 27, 2022, 12:22am

Yes, they do. In fact having watched how overclocked ram behaves thanks to ECC, logging, and spending way too much time playing around, I’ll never overclock anything other than ECC ram.

I could find a setting that could pass all tests for days, only to find a cluster of errors show up every month or two. So I’d back off a bit, and have overclocked ram that was provably stable. Plus even if there is an error regardless of the reason, I don’t have to worry about corruption or crashing.

There’s nothing sacred about ECC ram or other “server grade” product stratification bullshit. Instead it’s a fucking tragedy that it isn’t the baseline for ram.

FaunCB · August 27, 2022, 12:23am

Wonderful, can you teach me this dark art?

If I could keep my 28GB of ram and not spend a dime that might be the coolest thing to ever happen to me in my life, not even kidding

Log · August 27, 2022, 12:46am

I’m only familiar with DRR4 overclocking, and even that change a bit from the early days, when I was doing it.

However there’s probably decent carryover on the meat of what you’ll want.
First, don’t actually expect too much real life performance increase. Think low single digits.

Basically how I’d roughly start approaching it is:

Increase the frequency, followed by memory voltage until you can’t go higher, or there’s no point trying. More voltage = more noise, so eventually too much is only going to hurt stability and make ram hot (also hurting stability)
Then you increase the primary timings, one by one.
Then maby bother with the secondary timings. I wouldn’t burn time beyond that.

Keep a careful log of all your changes. Then you don’t have to test every single change, only every few. If it ends up unstable, then you can just backtrack. Looking at ECC logs is nice, because you can get an idea of how unstable it may or may not be in real time.

There are likely guides out that are probably better, and point to which times are good to mess with and how, but this is the gist of it.

Log · August 27, 2022, 1:00am

Actually, before all that, does your platform even support both ECC ram and overclocking? It’s a fairly rare thing.

For ECC to work, the motherboard has to have special traces in place (or something like that)

For DDR4, many consumer Ryzen boards unofficially supported ECC. On the server side, I only know that some recent ASUS stuff allows overclocking and even then there’s no ability to adjust voltage.

I don’t know for DDR3, but I imaging such things are very rare.

FaunCB · August 27, 2022, 3:39am

Yeah its an Asus P9X79 LE

GigaBusterEXE · August 27, 2022, 4:19am

I overclocked some 3200 RDIMMs on a 5975WX and anything past 3200 was stable but provided a performance decrease as the ram was being flooded with errors and transparently correcting them which lead to the performance loss

I would stick to 1866 modules and get as many ranks as possible to improve performance

FaunCB · August 27, 2022, 4:20am

I have tested 4x4 8GB and 16GB dimms, they do not work.

ATM all my dimms are 2X8 I think

PhaseLockedLoop · August 27, 2022, 4:21am

Okay well hold up dude. That’s not exactly true. I know that this is going to get into the weeds but for ECC on buffered memory that’s totally true and I’m not going to argue you there

But when you start to get into registered buffered memory not only does the potential for problems get introduced but a whole slew of timing complexity is introduced

ECC on consumer boards is unbuffered unregistered so in a way it’s a joke in comparison but I’ll shut up. It’s a Convo for the lounge

GigaBusterEXE · August 27, 2022, 4:22am

2 dual rank sticks in each channel?

FaunCB · August 27, 2022, 4:22am

Actually thats kind of helpful information. I assume I will have to time before, during, and after the ECC controller?

PhaseLockedLoop · August 27, 2022, 4:22am

Do you want some resources on understanding the different kinds of error correcting memory??

FaunCB · August 27, 2022, 4:23am

All slots are filled, the closest to the CPU has 2GB dimms though. I looked up some shit, it was in chinese so it only half translated correctly, but from what I could tell the closer to the socket the smaller dimms are the better. As it happens the closest slots are a pair.

FaunCB · August 27, 2022, 4:25am

That would be cool. I know about buffered and unbuffered, but if theres a lot of different shit and I’m just dense because I only have access to a certain tier of shit, I’d absolutely love to learn.

If you have better things to do though go do them lol

PhaseLockedLoop · August 27, 2022, 4:25am

So I’ll get into it a little bit, but if I write you something you’re going to need the attention span to sit down and read it all okay

And up there I should clarify. I spoke about ECC and I kind of lumped it in with those other two things and they’re not exactly related. They’re often included together but they’re not actually related as a technology and I’ll get into that. But I got to get to my desk to do so because there’s no way in fucking hell I’m typing this on my mobile

FaunCB · August 27, 2022, 4:26am

I am now requiring you to as you typed that relatively fast

/s

PhaseLockedLoop · August 27, 2022, 4:28am

It’s not speed. It’s the matter of autocorrect and all of the other problems when I start writing things out, especially when they’re long and they’re about topics that are not always included in the dictionary

FaunCB · August 27, 2022, 4:29am

#DocumentThePlanet

PhaseLockedLoop · August 27, 2022, 5:26am

In order to correct more errors, you would require more bits. You would use 10 bits to store 8 bits of information to have 2 bits of error correction vs. 1. You see where this is going. This is already wasteful enough, with 20% extra being used on the chips for just error correction. There are also diminishing returns past 1.

It works pretty simply. You have binary code, which is what the data is stored as. Imagine a 0 or a 1. If I read either, I am trusting I read the right bit of memory. I will never know if a zero got flipped to a one by some cosmic ray or a bad chip. You would never know, so how do you solve this problem?

In the deep past, engineers tried to solve that with parity. Parity was adding the ninth bit per 8 bits stored. The memory would be checked for how many zeros and how many ones were in the byte stored. The ninth bit is set to make that an even number. (Even parity type, but let us not get too deep into even and odd parity). If you read a byte and the number was wrong, then you knew something was wrong. Unfortunately, what is our limitation here? If you were thinking, how do I know which is wrong? You would be correct.

Parity is for farmers, so let’s introduce ECC and how it corrects this issue. ECC uses 10 bits and a complex algorithm to discover when a single bit has flipped. These algorithms can vary and are incredibly timing-sensitive. In space systems like what I work with, we are fairly used to the triple modular redundancy error code correction algorithm; however the most common is known as Hamming Code. Both of these also allow ECC to understand what the original value was. We are going to stray away from those. I wish I had a decent demonstration, but for the sake of this conversation and everyone’s brain, not TLDR, this is too hard.

In sane CPUs, the memory controller is on the CPU die. Most desktop CPUs then talk directly to the DIMM sockets holding the RAM. It works; no extra logic is needed to make this implementation work. This is cheap to build, and the speed is high because there’s no delay going from the memory controller to the RAM. It is good enough for a plebian tier consumer board. Consumers want their browsers to work. Having to redo a calculation for lousy memory is not catastrophic.

However, there is a limitation, a memory controller can only drive a limited current at high speeds. There is a limit to how many memory slots may be added to a system mainboard. This also limits how many DIMMs maybe installed. This is the partial reason behind not seeing as much memory on crappy consumer tier boards. On server motherboards, this changes completely. You often want to utilize far more memory than a consumer system. Now we introduce what is called a “register” buffer to the memory. Reads from the chips on the DIMM first get copied to this buffer. This is why you hear “Buffered.” This is much easier on the memory controller than it handling all of this load. Instead, it only has to reference the ultra-fast ultra-low load buffer. There is, however, a delay. This buffer later connects to the memory controller to transfer the data. This register delays things, making memory slower in manners foreign to a consumer system. This can also be undesirable in the enterprise space; thus, it is only used/needed on boards with many memory banks. Registered ECC memory should only be used where it is absolutely required. It also creates an extreme sensitivity to clock changes on the register and the memory chips. You have introduced more elements that have to work together.

In a directly connected case, unbuffered RAM vs. buffered/registered RAM isn’t a case where one is better or worse than the other. It is a side grade and a matter of trade-off. Overclocking is not taken into account here, and when you introduce it, registered memory will fail most of the time. These technologies have different trade-offs regarding how many memory slots you can have. Registered RAM allows more RAM at the cost of some speed and money (and overclocking). In most cases where you need as much memory as possible, that extra memory compensates for the RAM running at a slightly slower speed.

This is where this misconception comes from

What overclocks well:
ECC Unbuffered and Non ECC DRAM
What doesnt overclock well:
ECC Registered/buffered memory
Registered/unbuffered non ECC memory

So when I got at @Log up there. He wasnt wrong. He just didnt color the whole story in

Resources if you want to understand ECC in space systems

Resources if you want to understand it on terrestrial systems