In one recent video Wendell remarks that non-ECC RAM has some error-correcting features. What is he referring to?
ECC RAM can detect most errors and silently repairs 1 bit errors or maybe one-symbol errors.
But what capabilities does non-ECC RAM have? I'm referring to the RAM we're all buying, not specialty RAM like RDRAM or FB-DRAM.
hopefully NONE.
current DRAM is stable enough, memory errors are very rare. If we had any kind of error correction tech it would be slowing our pc down. I, as a gamer, would not like it at all.
I guess that, if a memory fault were to happen, theres a 99% chance that the OS kernel would see the mistake and simply 'try again' or do a memory flush of some sort. You can imagine that actions like that are not very welcome on servers, as then many people and systems are affected by the performance hiccup. (not just you who is simply playing a game and has a little & very rare framedrop)
Well, suppose you were playing Ultra Gun Crunch 5, minding your own business, and then suddenly an ionizing event flummoxes the game causing it to give each enemy biphase carbide armor, two special between-turns attacks, and a BFG?
What would you think then?
Error correction does not slow down computation, this is just false. RAM modules that have Error correction have been designed with extra IC modules attached to them that specifically take over the task of fixing incorrect ones and zeros that occur when the RAM ICs are doing their job. The fact that most ECC DDR3 is 1333 MHz or 1600 MHz simply comes down to cost of manufacture. A single DDR3 1333 MHz 32GB ECC DIMM will range from $650-$1,200 per individual DIMM.
The correction is done by the EDAC module which is in the cpu.The ram just has an extra part that sends a signal to it.
In a gaming or productivity pc you don't really need ECC ram anwyays.It is mostly for server/supercompute stuff.
The EDAC module is a Linux kernel module, it's software.
I did some more research and there doesn't seem to be any hardware capability to detect errors in non-ECC DRAM modules. In software, there are many ways to periodically scan memory and check for corruption, whether the memory is currently used to store useful data or not. But this is a lower level of assurance that memory values are correct.
If you had a way not only detect the error, but then for any location trace all the data that might be affected by the bad value, that would be something. You go and repair the original value, then recompute the bad data. Even if the corruption happened in a code segment, you still know that the damage is limited to the data produced by processes that contain that code segment.