Ryzen: Finding & Running 2666+ ECC. Or Build our own ECC? | Level One Techs

Want fast Ryzen ECC? Or for Threadripper 2? It's doable now. But we don't want to build our own fast ECC from samsung B-Die 3200!


This is a companion discussion topic for the original entry at https://level1techs.com/video/ryzen-finding-running-2666-ecc-or-build-our-own-ecc
2 Likes

To be honest I am not looking to buy memory until DDR5 (I invested in a G.Skill 3200 CL14 32 GB kit), but if I were, I would definitely consider ECC, and willing to buy if the price is not much higher.

Btw I love this video, and I can’t wait for the followup :slight_smile:

Any chance you could make a tutorial on how you’re testing and verifying ECC is working on Ryzen boards?

I saw you helped out the guys at HardwareCanucks, but that was over a year ago. And they didn’t go into much details with the exact commands.

Some numbers I have for common RDIMM and UDIMM B (& Hynix A) Die chips

Samsung B Die (20nm)

K4A8G085WB-BCPB  - Most UDIMM 3200C14-14-14-34 (JEDEC DDR4-2133 (15-15-15))
K4A8G085WB-BCTD  - Most RDIMM 2666C19-19-19-43 (JEDEC DDR4-2666 (19-19-19))

Micron B Die: (20nm)

Used by Crucial Memory Sticks

D9TNT (MT40A2G4WE-075E:B)

Hynix A Die: (21nm)

H5AN8G4NAFR-VJC

Hi,

Samsung M391A1G43DB0-CPB at 3200Mhz and 16-18-18-18-38 on MSI x399 SLI Plus and Threadripper 1900X

Memory Device
Array Handle: 0x0028
Error Information Handle: 0x003F
Total Width: 128 bits
Data Width: 64 bits
Size: 8192 MB
Form Factor: DIMM
Set: None
Locator: DIMM 1
Bank Locator: P0 CHANNEL D
Type: DDR4
Type Detail: Synchronous Unbuffered (Unregistered)
Speed: 3200 MT/s
Manufacturer: Samsung
Serial Number: 7429C99D
Asset Tag: Not Specified
Part Number: M391A1G43DB0-CPB
Rank: 2
Configured Clock Speed: 1600 MT/s
Minimum Voltage: 1.2 V
Maximum Voltage: 1.2 V
Configured Voltage: 1.2 V

[root@localhost user]# memtester 2048 1
memtester version 4.3.0 (64-bit)
Copyright © 2001-2012 Charles Cazabon.
Licensed under the GNU General Public License version 2 (only).

pagesize is 4096
pagesizemask is 0xfffffffffffff000
want 2048MB (2147483648 bytes)
got 2048MB (2147483648 bytes), trying mlock …locked.
Loop 1/1:
Stuck Address : ok
Random Value : ok
Compare XOR : ok
Compare SUB : ok
Compare MUL : ok
Compare DIV : ok
Compare OR : ok
Compare AND : ok
Sequential Increment: ok
Solid Bits : ok
Block Sequential : ok
Checkerboard : ok
Bit Spread : ok
Bit Flip : ok
Walking Ones : ok
Walking Zeroes : ok
8-bit Writes : ok
16-bit Writes : ok

Done.
[root@localhost user]# edac-util -v
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow2: 0 Uncorrected Errors
mc0: csrow2: mc#0csrow#2channel#0: 0 Corrected Errors
mc0: csrow2: mc#0csrow#2channel#1: 0 Corrected Errors
mc0: csrow3: 0 Uncorrected Errors
mc0: csrow3: mc#0csrow#3channel#0: 0 Corrected Errors
mc0: csrow3: mc#0csrow#3channel#1: 0 Corrected Errors
mc1: 0 Uncorrected Errors with no DIMM info
mc1: 0 Corrected Errors with no DIMM info
mc1: csrow2: 0 Uncorrected Errors
mc1: csrow2: mc#1csrow#2channel#0: 0 Corrected Errors
mc1: csrow2: mc#1csrow#2channel#1: 0 Corrected Errors
mc1: csrow3: 0 Uncorrected Errors
mc1: csrow3: mc#1csrow#3channel#0: 0 Corrected Errors
mc1: csrow3: mc#1csrow#3channel#1: 0 Corrected Errors

Regards

2 Likes

That’s using K4A4G085WD-BCPB or -BCRC chips.
So it’s D-Die.

Also:

https://www.anandtech.com/show/12179/samsung-starts-mass-production-of-8-gb-ddr43600-ics

##EDIT
Am I confused here? :thinking:

Is Samsung recycling old part numbers? Or is that a different anandtech article, and that is a 2015 datasheet?

I now have more questions…

Yes the module he is using is older Samsung D die, from when DDR4 first came out. The module he is using is a 2 Rank 4Gb. The IC is (512 x8).

As for the anandtech article most likely is using stock images and not representative. If you check the Samsung site they only list up to 3200 as the fastest available.

1 Like

One of the biggest problems you might face trying to make this is due to the way UDIMM has clock, address and command in a daisy chain topology(aka fly-by). It does have its advantages but also has limitations.

If you look at a RDIMM module it has a registered clock driver which breaks out from the middle to either end.
DDR4 RDIMM

On UDIMM it needs to go through all of the ICs from first to last.
DDR3/DDR4 UDIMM

defazio_figure2

2 Likes

Now that’s what I was suspecting.

But actually then that’s quite nice then that even 2015 D-die memory can achieve 3200MT/s ECC at reasonable timings.

Also means that perhaps Samsung’s new 10nm memory might do amazing things for higher JEDEC rated ECC speeds.

D Die hit 3466 non ECC not to long after DDR4 launch it could do those speeds easily.

Edit:
Looks like they may have released JEDEC 3200 but timings are pretty bad

yes, I bought the RAM in early 2016 for a different system and took it as a temp. solution for the AMD system because of the current prices.
The timings are not even optimized, it was just the first attempt

I mentioned this in a previous post, but I’ve been overclocking 8 sticks of Super Talent F24EA8GS ECC 8GB 2400 ram (B-die) for some time now in an asrock x399. Can’t quite reach 3200, but I figure this is pretty darn good. The trick I found was that ProcODT had to be 48 (which is normally considered too low, in actuality you have to test each and every setting brute force style to see what is actually best for your system), anything else was quickly unstable. A more knowledgeable person could probably optimize further, but I’m perfectly content.

Here are my Settings and some benchmarks:


The spec sheet can be found here: http://www.supertalent.com/datasheets/SuperTalent%20Datasheet-DDR4-ECCUDIMM-F24EA8GS.pdf

It was these two cool dudes who originally made me aware of what was possible.
https://www.reddit.com/user/Limited_opsec
https://www.reddit.com/user/ComputingDisorder

I got my sticks off ebay during one of their 20% off sales, so the final cost per stick was somewhere around $80-90, during the height of the ram price fixing bullshit. These typically run about $100

Edit: The 16gb versions (F24EB16GS I think) are also apparently B-die and overlock to a similar level, though I have no personal experience with them.

Also, thanks for doing this Wendell. ECC is something that everyone deserves to take for granted.

2 Likes

This is just a guess but… rowhammer and see if any corrected errors turn up in the logs?
If the board isn’t supporting ECC properly then the errors will be silently corrected, but not reported.

Thanks, didn’t know that about the motherboard.

Wendell said that in the video :stuck_out_tongue_winking_eye:

If it’s got S at the end F24EA8GS It’s Samsung B-Die 20nm chips.
They use H and M for Hynix and Micron respectively.

I have a pair of the Hynix DIMMS F24EA8GH and they’re truly garbage in comparison. It does rated specs barely and no more.

But your Samsung variants. Holy %Z%$ batman I gotta get me some of those! :smiley:

The result is particularly impressive for 8 DIMMS.

Memtest86+ supports ECC and will tell you if you have ECC errors. Rowhammer won’t do much good if the modules aren’t susceptible to rowhammer

The chips aren’t immune to rowhammer, it’s just that the ECC circuitry corrects it, that’s the whole point. The memory ICs are the same as used on regular RAM modules. The important thing is whether it’s being reported to the OS (which is a UEFI/BIOS thing) and what the OS does with it.

Memtest is a pretty basic test, its tests aren’t meant to produce errors, they are meant for checking existing manufacturing defects or or other physical issues with the RAM.

And that’s exactly where rowhammer comes in. It’s meant to produce errors on perfectly fine RAM modules.

To put it another way: Unless you have a defective RAM (non-ECC or ECC), memtest shouldn’t report any errors, whereas rowhammer should with the same modules.

I didn’t vote in the strawpoll but I agree the market simply isn’t there for what Wendell wants.

That is some amazing info.

I am going to have to trawl the internets to try to find 4 sticks of the F24EB16GS in the UK for my new upcoming system ( assuming the 16 core version of threadripper2 is real), and nvidia finally get off there ass and get that 1180Ti out.