ASRock Rack X470D4U2-2T

Unless I am horribly mistaken (very possible), a lot of those are single bit errors which should have been corrected and reported as an ecc corrected error if ecc was working.

Edit: Self-BS’d, math is hard…

But at 1.152 V memory voltage the system is allowed to misbehave :wink:

1 Like

fyi: this is still at the highest possible frequency and tightest timings when using 1.2v. If I increase any just a single step, it doesn’t boot at all…
So yes, it did surprise me how much I could still drop the voltage before achieving this unstability…

This guy here claims the opposite:

I just completed the first minor tests at these settings:

  • memtest86 gives errors (as shown in the screenshot above).
  • memtester on Linux hangs pretty quickly (didn’t spot any error message, but I didn’t check the logs yet)
  • prime95 on Windows crashes Windows after some time

No corrected errors were logged in the BMC logs, or in edac-utils or in Windows Event Viewer

memtest86 gives errors

Hi, Hoping someone can shed some light on this. This is my first server built having build many PC’s. I have the X470D4U2-2T board with a Ryzen 3600X however I can’t get the system to POST. I can log in via IPMI, KVM says no video signal (even with nothing plugged into the VGA port) and I get a POST 80h: ee error when I look at the POST Snoop. I’ve tried 2 different ECC DDR 2666 sticks, separately and together on the A1/A2 slots. I have disconnected any HDD so it’s literally just the processor and RAM plugged in. Am I being a bit thick here? Anyone have any ideas what might be happening here? PSU is a seasonic M12 620W Bronze unit. If anyone could help it would be really appreciated as I’ve run out of ideas now!!

I’ve just seen that in the Sys Info > Power Source it says Power Supply Error - guessing this would be the problem?! What’s the issue with ASRock and Seasonic PSU’s? I don’t have another PSU to hand - is a different PSU likely to solve this from people’s previous experiences?

Which BIOS version?

Hey nx2l - I have tried to update the BIOS to 3.2, however it doesn’t actually come up with the BIOS number in the webportal, it’s just blank there. However I think it also shipped with 3.2 anyway as it was built in October and has a X470D4U22T P3.20 sticker on the board.

I have the same CPU, similar RAM, same mobo. Just a different PSU (which it’s also pretty sensitive too apparently. Here’s what I did to make it work:

If that doesn’t work for you, then I think you have a problem somewhere (like your PSU perhaps?)

1 Like

Hey Mastakilla, thanks for the response! I was actually following your steps to get as far as I did. It’s a pity that the board is so sensitive to the PSU. I literally can’'t even get into the BIOS at the moment (no signal when plugged into VGA and ‘No Signal’ through KVM). I have a PSU coming on Wednesday which I’m not too hopeful about. Also not had any beeps and the LED’s for the post codes on the board have never come on so I think it’s either a duff board or the PSU! Will update when I get any further!

  • Make sure you have at least BIOS 3.20
  • Inside the case: Only install Mobo, CPU (+ cooler), RAM and PSU.
  • Outside the case: Unplug everything except power and IPMI network
  • Reset CMOS
  • Boot using the IPMI

If you still get no signal, contact Asrock Rack and perhaps try a different PSU.

Hey Mastakilla! I’ve just tried as you suggested but I’m still getting no signal. I think it might be the PSU as the it does say PSU error in the sys info. I’ll wait till Wednesday and try the different PSU and will go from there!

After a lot of testing I’ve now finally mastered RAM overclocking in a way that I can vary between stable and unstable settings. The trick is lowering the voltage for unstability, as lowering the timing or increasing the frequency too much mostly will cause it to stop booting instead of becoming unstable.

After figuring this out I’ve done a lot of testing. I’ve tested from hardly bootable to slightly unstable using MemTest86, memtester (on Fedora Rawhide with kernel 5.4.0.0.rc3 and 5.4.0.2) and prime95/aida64_bench/Ryzen_Master_test (on a fully updated Windows 10 Pro, first with amd_software_1.09.27.1033.zip and later with amd_chipset_software_1.11.22.454.zip chipset drivers).

To give you an idea of the testing I’ve done, here is an Excel I’ve created to keep track of things:

In mean time I’ve had millions of memory errors (in total) in very varied conditions. It seems almost impossible to me if there was not a single single-bit-error or two-bit-error in all these millions of errors.

But… Unfortunately I couldn’t find any report of a corrected or logged memory error in either the IPMI Event Log, the Linux edac-util or the Windows Event Viewer (even though all of these report ECC to be active and correctly configured - see my posts above).

Now I know that doesn’t mean that no memory error-corrections have happened, but that is only half of what ECC functionality is. Reporting / logging these memory error-corrections is at least as important as the actual correcting itself (How else can you know your RAM is dying or is unstable. That’s like having a RAID5 which doesn’t notify you that one of your disks is dead :stuck_out_tongue: ).

So it seems to me that ECC is not working on this motherboard with a Ryzen 3000 CPU (I don’t have the older Ryzen CPUs for testing).

I’ve reported this to Asrock Rack and they’ve send me the following response:

Dear Mastakilla,

Due to X470 belongs to desktop series
It’s not like server MB has native support of ECC report.
We are checking with RD and AMD if X470 can support ECC report.
We will reply to you ASAP

Best regards,
Kevin
Asrock Rack Incorporation

I’ve replied to this with:

Hi Kevin,

Thanks a lot for looking into this! That is greatly appreciated…

I understand that the X470 is indeed a desktop chipset. Also all AM4 CPUs don’t have officially validated ECC support by AMD (although AMD confirmed that it wasn’t disabled).
So you could argue that non-validated half-working (not reporting / logging) ECC support is acceptable. And I also agree with that, for consumer brands like Asrock, Asus, MSI, etc.
However, if a brand like Asrock Rack or SuperMicro creates a X470 motherboard with “Supports 4x DDR4 ECC and non-ECC UDIMM, max. 128 GB” in the specifications and if the IPMI Event Log contains sensors for “DRAM ECC Error A1/A2/B1/B2”, then people (like myself) will assume that it is actually working and validated. In that case, I don’t think that it is acceptable for it not to work 100%, as people buying these brands, actually are expecting it to fully work. I don’t think that is a reputation or name you are looking for, as a brand called “Asrock Rack” :slight_smile:

Please let me know if there is anything else I can do to assist.

Kind regards,

Mastakilla

The response from Asrock Rack seems to admit that it currently does not fully support ECC, however, it could also just mean that Kevin is not sure about it… So I’m hoping for a decent response from their R&D.

It would be nice if someone could try some testing with a Ryzen 1000 or Ryzen 2000, to see if ECC works with those CPUs…

6 Likes

Awesome work. I have a 2600X in my desktop. What would be the quickest route to validation?

@Mastakilla

Just wanted to express my gratitude for your work! I would have had to do similar testing but currently I’m in “various-stuff-has-to-be-done-within-two-weeks” mode with little to no free time for further going down to X470D4U rabbit hole.

Awesome work. Just a quick reminder: From your findings, we cannot know if there is ECC functionality at all.

What your work shows is that when instability up to the point of non-correctable errors is reached, no such errors are reported. Thus, we can imply that corrected errors are not shown either. Therefore, you are surely correct in saying that ECC reporting does not work.

However, lacking reporting, we can also not be sure that potentially-correctable errors have ever been corrected. Possibly, ECC RAM does “nothing at al” ™ on this board/chipset/CPU family, so maybe the question is not if ECC is “not working 100%”, but: “does it work at all”?

In order to verify basic ECC functionality, one could try to use rowhammer to show instability - if ECC is at least basically functional, a rowhammer attack should show no effect with ECC, but indeed should cause errors on non-ECC memory.

I cannot try, because I am still waiting for my board (and I did not buy ECC RAM in the first place).

I wondered why no other company apart from Asrock Rack offers a server board for Zen2, because with ECC support even on desktop CPUs and cheap prices for 6-16 core CPUs this platform seems like the ideal candidate for SOHO servers. Probably, you have now found an explanation…

For those who want to test ECC themselves:

If you’re still in testing-phase then the best would be to do a multiboot of some OS, like

All using UEFI.

On Linux I’ve installed memtester and edac-utils packages (yum install …). On Windows Prime95, AMD Ryzen Master, Aida64.

I used mainly AMD Ryzen Master on Windows, for changing the frequency and timings. And the BIOS for changing the voltage (not sure if that’s working on Ryzen Master). See my posts from earlier on how to change the voltage in the BIOS.

Then just play with it and be prepared to do MANY CMOS resets :stuck_out_tongue:

First I increased the frequency till it stopped booting (1533Mhz).
Then I tightened the main timings till it stopped booting.
Finally I lowered the voltage till I got errors in memtest86. Those settings I then used for testing on Linux and Windows as well…

In my case frequency and timing changes hardly caused memory errors (when going to low it just stopped booting). So after awhile, for many changes of the frequency and timings I didn’t even bother testing it with Memtest86 anymore. If it booted, I just tried lowering it more…
When lowering the voltage I changed the number per 4 or 8 in the beginning (it’s quite fine grained), but I did test each change with MemTest86 (not always a full run)

In your case this might be totally different though… So just try a bit :wink:

When you get errors in MemTest86, then you can run

  • Windows:
    Prime95 (usually that crashed / rebooted my Windows after awhile).
    See some post of mine above for which event you should look for in Event Viewer.
  • Linux:
    swapoff -a (to disable the swap)
    memtester 30g (to stresstest the memory, leaving 2GB for the OS)
    edac-util -v (in another window, to check for logged / reported errors)
  • And I also regulary check the IPMI Event Log for “DRAM ECC Error A1/A2/B1/B2” as well. (those from MemTest86 can only come in here)

These sites were pretty useful for understanding the timings:
https://www.overclock.net/forum/18051-memory/381699-ram-timings-explained.html
https://www.techpowerup.com/review/amd-ryzen-memory-tweaking-overclocking-guide/2.html

For those wondering if they should “downgrade” from the test version L3.31 to the final 3.30 that got released yesterday, I’ve received following answer from Asrock Rack

Dear Mastakilla,

If everything is good so far then we won’t suggest to update BIOS.

Beside the newer version contain more bug fix.

Best regards,
Kevin Hsiueh
Asrock Rack Incorporation

I did upgrade my IPMI from 1.60 to 1.70.

Same, I also stayed on 3.31 but upgraded BMC.