Memory Layout on ECC RAM

Hey all,

I recently bought some no-name ECC ram from eBay, which has something quite interesting. It has hynix dies which seem to hit 2666 (and even above) fairly well, but the layout has me scratching my head. I’ve heard online that ECC dimms will have an odd number of chips for parity. In my case, my 16GB sticks are dual rank, with 5 dies on one side, and 4 on the other.

My understanding of ranks is that the CPU can access either rank at any time, to allow for interleaved/parallelized memory access. How is ECC even happening if only one side has an odd amount of chips? Does this mean only half of my ram’s capacity is actually ECC?

Windows reports my Memory as multi-bit detectable, and linux says similarly. I have had some issues trying to produce errors on this ram to validate the ECC functionality, however.

Anyone familiar enough with ECC ram to advise?

Thanks in advance!

Could you post pictures of this? It does seem unusual to me.

I have some of that ram. It is running in an x399 asus zenith extreme (non alpha). On the last bios revision it was flaky and had post issues. But once it got past post it was fine. On the newsest bios, it runs great. I have it at 2733 mhz 15-16-16-42 1T, so far. It has been passing the pro version of memtest86 4 full passes.

With the newest bios, and the better posting and stability, on overclocking the ram I have pushed it to the point with CL 14 that memtest86 reported ecc errors, and it was different that I have seen before, on non ecc memory. But that was also with the non pro version of memtest86. I bought the pro version specfically to test my new ecc system. Probably a waste of money, but now I can fully test my system, I guess.

I have that exact ram! I am currently using memtest86 pro to force ECC errors, but it isn’t reporting any errors being detected…

Looks like there may be an issue with my BIOS supporting ECC injection? What mobo do you have?

Edit: re-read the post. I have an asrock x399 taichi. I can’t find any settings in my bios relating to injection. Did you have to do any fiddling in the BIOS for this?

@IALinux

@MtKingsnake 's picture matches my ram. Whats going on here?

The only thing I can think is that the center chip on the side with 5 chips is handling ECC for the other 4 chips on both sides. This looks like it is unbuffered ECC?

Yep, here is the listing:

Windows validation:
C:\Users\user> wmic memphysical get memoryerrorcorrection
MemoryErrorCorrection
6 (this is multi-bit detection)

-also-

C:\Users\user>wmic memorychip get datawidth, totalwidth
DataWidth TotalWidth
64 128
64 128
64 128
64 128

I find it odd that the TotalWidth is 128 rather than 72. Linux reports this as multi-bit detecting as well.

The chips are 16mbit as opposed to 8mb (from what I recall).

As to fiddling in the bios to be able to force ecc error, I did not. Just the “normal” ram timing tinkering.

The listing describes it as UDIMM, which is additional confirmation. The U is for Unbuffered instead of an RDIMM for Registered. I guess it makes sense. If there were 9 chips on each side with 8 being for the data and one being for the ECC then 1 ninth of the chips are being used for ECC. Here 1/9th are being used for ECC, but since there’s only 9 chips the ECC chip has to be on one side or the other. It looks legit to me, but its an unusual layout based on my experience.

1 Like

@MtKingsnake Here is an interesting post I found: https://www.passmark.com/forum/memtest86/42035-ecc-injection-on-x370-taichi

I have had it running at 2866 on auto timings, and it boots on 3200 on auto timings, but freezes right away. I just want to overclock it aggressively to force ECC errors, but I am a noob at overclocking ram.

I should be able to see errors running at 3200 on memtest eventually, right?

At some point the system will either hang and can’t run memtest86 or it will start showing errors. Its been my experience that it can take 30 minutes to an hour for memtest to start showing errors as the system heats up. Usually if it runs good for an hour it’ll stay that way, but it doesn’t hurt to let memtest86 run overnight just to be sure.

1 Like

Yeah, I have run probably 3 full runs without errors so far. Not really sure what else I can do to cause my system to heat up other than turning off fans or using a heat gun or something stupid like that. :frowning:

I really really want to verify this memory can detect and fix errors. Maybe I can try using @MtKingsnake 's timings

His settings are probably worth a try. You definitely don’t want to use the auto setting to try to force an error. I’d expect that will back timings off to try and prevent any errors. I wouldn’t heat anything up intentionally. You just want to get things up to normal operating temperature.

I got errors at 2733mhz, 14 - 18 - 18 - 18 - 42 1t

But I am stable at 15 - 16 - 16 - 16 - 42 1t. So I assume the CL 14 did the trick.

I crashed the system at 2800mhz, so I didnt see error reports. So it appears you need to be stable enough to not crash, but unstable enough to generate errors.

2 Likes

Gotcha. Did memtest report your ram’s SPD information correctly? No matter how I overclock the ram, it always says 2666MHz with default timings.

It never does report the changed spd for me. But I do see the latency in nanoseconds go down with tighter timings and the bandwidth in gigabits per second go up.

I think the spd reported is just the values saved to the chip as one of the standards, and it doesn’t matter that much.

1 Like

Gotcha. I just got memtest to crash running @3200, and also @2733 with your timings! That means a multi-bit error was detected?

Possibly? From what I understand, there is a small amount of ram that doesnt get tested. The space that memtest86 is on. If the error happens there, I think that could crash it. But i am far from an expert. Try changing to 2T? But keeping CL 14? Also try running again at the same timings, and maybe the error happens at a different place and does not crash memtest86?

Also, I am running 2x 16 gig sticks. Are you running 4x sticks? If so, you might have to back off on the speed and timings a little.

Yeah, I am running 4 sticks. How far should I back up? CL 15? 16?

edit: Also, when you were seeing errors, were you seeing them printed on the bottom as ECC errors? Or freezes?

Honestly I think it depends on the system. When tuning beyond mfr specs, I have found that different components, even of the same make and model can vary. Such as “golden sample” cpus.

Plus you dont know if the ram is crashing your memtest86 or maybe the IMC on the chip couldn’t handle the OC? Just take it one step at a time. Keep CL14 and do 2t. Then go cl15. Then 16. If you find that the system is rock solid at a certain timing go tighter on another.