Ryzen 5700X ECC reporting

Good find. The lingering question I have is would ECC error detection work if you drop in a regular chiplet Zen 3 processor with PFEH turned off.

I understand that’s easily another half a day of testing (at least), so if it’s too much trouble then it’s ok if you don’t test anything. No pressure.

1 Like

That’s great to hear, I appreciate the update… and all the pain and tears you went through to find this out.

1 Like

@Zedicus
Yes, that is also true. Sad, but true. :slight_smile:

@Mach3.2
Do you mean the 5700X? For the 5700X, there is no PFEH option in the BIOS (neither in 4.40, nor in 5.03 BETA). PFEH only shows up for the 5750G. The funny thing is that in earlier BIOS versions (4.40), it did not show up no matter what, not even for the 5750G.

1 Like

Yup, chiplet Zen 3 as in anything from 5600X to 5950X.

Regarding the PFEH option, that’s good to know. Sounds like ECc error reporting likely won’t work on the 5700X then.

likely true, but it only applies to that specific version of MB and BIOS. that is the point i keep trying to get across with all this stuff. if a MB or BIOS version only enabled it on a specific set of CPUs you would need to manually test them all in order to create the list, as even the SUPPORT branch for ASRock, or whatever brand, will not have an exact list of what the engineer was drinking … errr, thinking that day.

1 Like

Happy to hear I didn’t spout nonsense!.

I bet it’s possible for a bios mod or something like the runtime patcher I can use to reveal additional hidden settings in my laptop bios to force such an option to show up or be set, regardless of the CPU check.

Whether doing so works or is stable or not, that’s another matter.

1 Like

Found this on Asus’ webpage. I assume it’s for reporting too, not just “it boots”. Depening on motherboard manufacturer then.

1 Like

I have never seen an error correction logged in my system (although I really wanted it to happened to verify ECC functionality). My friend has a rig at his home and he report the same sistuation
My setup is and AMD 4650G PRO + Asrock B550m + 32G UDIMM ECC running ZFS, the ARC take most of my ram and yet no event reported

root in ~ at r1 …
➜ edac-util -rfull
mc0:noinfo:all:UE:0
mc0:noinfo:all:CE:0
1 Like

@Log Wow I did not know that was possible!

@MattiP: Yes, I actually attached this to one of my e-mails with ASRock Support to try to support my claims that ECC reporting should work on the x570 Taichi as well, since it is said to work on ASUS x570 motherboards as well.

@nlgtuankiet: That is weird, I think that should also work (the 4650G PRO is almost the same as the 4750G PRO and I have seen some reports about the B550 Taichi reporting errors a few years ago at least). Maybe the system is just too stable to report anything (in my case I think my cooler did a decent job as it took a pretty long time to get my first ECC error reported).

Well, I did not think I would re-ignite this thread yet again, but today I ran into something highly displeasing after installing edac-utils on TrueNas (it took me this long because I fixed every aspect of my setup that was lacking, swapped the SSDs, re-installed TrueNas etc.):
Running edac-util -v on the recent version of TrueNas results in a “edac-util: Error: No memory controller data found.” error.

After looking around, I found this 1.5 years old post Fix for broken ECC reporting on current Linux kernels with recent Cezanne Ryzen Pro APUs (thank you for the heads-up @Log) but it seems TrueNas is running an older version of the kernel. I will be looking into whether if I can apply this patch, because without it, the ECC reporting seems to be broken. However, I have never in my life applied kernel patches, so this may take a while. :slight_smile:

EDIT:
After (a lot of) looking around, I came to the conclusion that it would be safer to just install a NIGHTLY version of TrueNAS Cobia, which has the 6.1 kernel, as recompiling the kernel itself will likely lead to ACLs and other things breaking. My only question regarding the nightlies is: can I go back to the STABLE version after having installed the nightlies? (e.g. after Cobia Beta or STABLE releases)

EDIT no.2:
According to the official roadmap, the BETA version for Cobia will get released in August, so I think I will just wait for that.

2 Likes

Is it possible for you to share that beta bios somehow ? Thanks!

X570TC5.03.zip (14.7 MB)
@ak47
I think that it is possible (check the attachment). Honestly, the only reason why I did not share it immediately was that I was waiting for ASRock to at least reply (in any way possible) and see what the results are (e.g. they release a new, official BIOS, etc.).

Well, as usual, they did not reply at all to any of my e-mails after June 5th (yes, that is more than 2 months now). Since they did not mention in their e-mail that this was an internal-only BIOS version or that sharing it further would be prohibited, I think that it is safe to share it.

Of course, since this is just a Beta version, I cannot guarantee that it will not have any bugs, etc., so just bear in mind that this is a non-official release. :slight_smile:

7 Likes

Thanks from me, too!

1 Like

Thanks a lot, this whole post has been very interesting to follow!

1 Like

@aBav.Normie-Pleb @ak47

You are welcome. :slight_smile: I hope that you will find it useful.

Although not directly related, I will also post an update after upgrading my TrueNAS server to Cobia just to finalize this post as a whole.

1 Like

By any chance do you know if the 5950x works with ecc (also which model is the most compatible with it) ? Thanks!

EDIT: I found this which seems similar to yours ?(ldlc[DOT]com/fiche/PB00547920.html)

I’ve personally verified ECC working properly on ASUS motherboards with a 5950X und yes, for AM4 you need ECC UDIMMs/Unbuffered, Buffered or Registered memory doesn’t work at all on AM4.

2 Likes

To really wrap this whole post up, here are the results after I upgraded my system to COBIA (BETA).

I can now say that if someone is looking to use this setup (5750G PRO) on TrueNAS, then it seems to be working on the Cobia release.

I installed edac-utils, and now, on the newer kernel it does not give any errors. Although for some reason the output of the command seems to have shrunk, it now only says “edac-util: No errors to report.”. :slight_smile:

2 Likes

Hi,
I made an account to clarify this.
It should work as there is no IGP on this processor.

In addition, there appear to be some issues either with EDAC or the vendors. I can tell you that I can log errors on several of the boards I have tested and they all appear different in when invoking dmesg but not Edac-utils or Rasdaemon…The former will not log errors and the latter will not work properly with kernel modules disabled/built-in.

Here are my results:

OS:
Gentoo 6.5.7 unstable (There is a regression in 6.1.xx with EDAC I believe)

CPUs tested: 1600 and/or 5800X3D
Ram: Kingston 16GB DDR4 3200MT/s ECC Unbuffered
Boards:

Biostar X470GTN:
PFEH selection is off on ECC selection automatically.,
1 bit detection ed confirmed
No confirmation 2 bit error detection
Nonstandard error log: “MCE event detected, corrected single-bit error”. No other information is provided or available
Will not log bit errors to memtest

Gigabyte 550I Aorrus pro ax:
Don’t recall if there PFEH option (Think no, not in manual)
1 bit MCE logged in dmesg.
No confirmation of 2 bit.
Will not log bit errors to memtest

Asrock B550 ITX/ac
No PFEH option, there is an “advanced Error reporting” but that is not relevant from I can see.
Verbose 1 bit logging to system log
Unknown 2 bit detection log
Will not log bit errors to memtest

Fatal1ty X370 Gaming-ITX/ac
PFEH option
Verbose 1 bit logging to system log
Verbose 2 bit detection log
Memtest unknown (board died while testing, RIP)
PFEH and ECC buggy after BIOS 5.90 on 5800X3D

I don’t have these boards anymore except a similar Biostar and I am planning on grabbing an ROG board to test ECC. Will need to move across the country first though so maybe in week or two.

3 Likes