ASRock Rack has created the first AM4 socket server boards, X470D4U, X470D4U2-2T

did you try disconnecting all power
holding the power button 5-10 seconds

then reseating the CPU and putting it all back together?

Umm, I think those are SO-DIMMS, did you mean M391A4G43MB1-CTDQ or are you using some of those weird SO-DIMM>DIMM risers/converters?

Also: Are you using correct memory slots? for 2 sticks that would be A1 and B1.

+Check lower frequency - for example 2400 Mhz for a single stick in A1 slot. (set frequency to 1200Mhz in the OC menu)

Hmm, @RoLee has similar sounding issue with 3900X
It could be a power issue as @nx2l already suggested but if it is something about problems with high power chips + this board then there probably isn’t much I can suggest.
At the same time just booting shouldn’t draw much power. -.-

Maybe try re-seating the CPU?

About the drive: which m.2 slot? 970 pro is a pcie drive so maybe make sure you are using M2_1 slot. M2_2 is supposed to automatically detect SATA/M.2 drive and switch. Maybe that is flaky.

That’s exactly what it was doing. It’s no longer an issue after repairing the BMC, was just trying to help @beryl.

what version did you flash to?

Can you access the ipmi now?

Sorry, I copy/pasted the wrong P/N.

I’m using M391A4G43MB1-CTDQ, straight off the X470D4U QVL list.

They work fine and pass memtest86 in another board.

I don’t think that’s it. If I can get the board to boot, it seems to continue on just fine for a while.

I’ve tried all combinations of no drive at all, either slot, and even a separate M.2 to PCIe adapter card. No difference.

I made some small progress this morning: I was able to boot into the BIOS and disable wait for BMC. Now it boots every single time and I can get into the ESXi installer, no problem.

However, after the initial ESXi loading phase, the board just reboots. Same spot every time.

Also, my two-digit LED diagnostic display has an LED that won’t illuminate. I wonder if this board has some solder connectivity issues.

Whatever is the latest as of today.

Yes, that resolved all issues.

1 Like

I just tried ESXi 7 Installation iso and it did not reboot after loading - I went to the point of choosing the installation drive.
(https://my.vmware.com/web/vmware/evalcenter?p=free-esxi7)
Thats with Ryzen 2600, 2 sticks of KSM26ED8/16ME and a samsung 960 EVO or Intel 760p, no expansion cards, default BIOS settings, I just disabled all VMedia in IPMI

So I guess its rather low probability that ESXi iso is having some issues with board-specific hardware.

Like at all? Nothing even when you manage to POST?
Didn’t you just disable the LEDs? The switch is in the BIOS>Advanced>Chipset Configuration>Onboard Debug Port LED

AFAIK It should be enabled by default (set to auto). And if it is disabled in the BIOS without you knowing then I would strongly suggest you reset BIOS to default settings

Thanks for the hint. I tried but it had no effect.

I also disabled any c states settings in my bios when I was experiencing similar issues

The LED display mostly works.

It just has one LED that doesn’t light up, so 6 looks like h, 3 looks like a backward f and so on.

Anyway, I finally recovered the BMC. For anyone else with the same problem, the trick is to download socflash from the Asrack Rack website (I can’t include links yet, look for the “How to update the BMC FW locally in DOS?” section) and copy it to a FreeDOS bootable USB drive along with the BMC firmware. Follow the instructions at the link.

I then reset the BIOS by removing the battery and jumping the BIOS reset jumper near the corner of the board just to be safe.

So far, everything appears to be working. ESXi 7 finally installed cleanly. I’ll run stress tests for the next 24 hours to confirm that everything is okay.

I run ESXi 7 with no issues.

It seems I have finally found a working configuration. These are the BIOS settings I changed. Now I have to check one by one which ones are really neded.

  • Advanced
    • CPU Configuration
      • PSS Support -> Disabled (Performance State, Cool&Quiet)
      • CPB Mode -> Disabled (Core Performance Boost)
      • C6 Mode -> Disabled
  • AMD CBS
    • CPU Common Options
      • Platform First Error Handling -> Disabled
      • Core Performance Boost -> Disabled
      • Power Supply Idle Control -> Typical Current Idle
      • ACPI _CST C1 Declaration -> Disabled
    • NBIO Common Options
      • IOMMU -> Enabled
      • ACS Enable -> Enable
      • PCIE ARI Support -> Enable
      • Enable AER Cap -> Enable

Thanks again for the help!

@RoLee What PSU/PSUs did you use when you were experiencing issues?

Maybe we are seeing something similar to 2013 Intel Hasswell situation here:

At that time many PSUs started appearing with ‘Haswell-ready’ sticker

Or it’s just what @Mastic_Warrior mentioned in this old thread: Ryzen C-State related problems -- what is the root cause?

@Tenrag Hmm, interesting.

Currently it is running with a "Seasonic Prime Ultra Platinum 550W ATX 2.4 (SSR-550PD2)"
/seasonic.com/prime-ultra-platinum

The other model I tested it with was an "Enermax Modu 82+ (EMD425AWT)"
/www.enermax.com/home.php?fn=eng/product_a1_1_1&lv0=1&lv1=54&no=7

Drop a support email to Seasonic and ASRock Rack, they know about PSU compatibility issues with this motherboard (just search the first half of this thread), but I seem to be black-listed and never received a resolution regarding the issues I had reported.

In the meantime I’m using SFX-L SilverStone Titanium PSUs, they don’t trigger anything (so far).

With BMC version 1.90 it seems to have become much better but I still don’t trust the X470D4Us with my Seasonic PSUs :confused:

I’m happy to report that, after disabling “Platform First Error Handling (PFEH)” in the BIOS, (corrected) single-bit are properly reported to the OS, also when overclocking / undervolting! So I’m now getting the same results as Diversity with his memory pin shorting method…

The reason I was failing to detect this earlier was:

  • Memtest86 v8.2 reported “unknown” for ECC support. Memtest86 v8.3 reported “enabled” for ECC support. So I assumed, if it was working, Memtest86 v8.3 should be able to detect them. However, a couple of days ago I figured out that Memtest86 v8.4 beta had Zen2 ECC support in its changelog. So after testing I figured out that Memtest86 v8.3 does NOT support Zen2 ECC, but Memtest86 v8.4 beta DOES support Zen2 ECC.
  • I only discovered the BIOS option “Platform First Error Handling (PFEH)” very recently. During all my previous testing, except for the very last couple short tests, it was set to the default “enabled”. I probably did too little testing with Linux / Windows after disabling it.

So in short:

  1. (corrected) single-bit memory errors -> motherboard (BIOS) -> OS ==> works 100%
  2. (corrected) single-bit memory errors -> motherboard (BIOS) -> OS -> IPMI ==> not sure if the OS properly forwards the error to the IPMI
  3. (corrected) single-bit memory errors -> motherboard (BIOS) -> IPMI ==> 100% broken
  4. (corrected) single-bit memory errors -> motherboard (BIOS) -> IPMI -> OS ==> 100% broken
  5. (uncorrected) multi-bit memory errors -> * ==> I’m not sure if it is broken (or perhaps not even possible on Zen2) or if we just haven’t been able trigger them yet. I’ve ran Memtest86 v8.4 with unstable memory for many hours now. In doing so, I’ve triggered about 3000 “ECC Correctable Errors” (=single-bit) and about 100 of CPU errors, but 0 “ECC Correctable Errors” (=multi-bit). Also using the shorting-method, we haven’t achieved any “ECC Correctable Errors” (=multi-bit) yet. We are currently in contact with the persons who wrote the paper (see link above for details) that explained the shorting-method, to see how to trigger multi-bit errors reliably.

So if I understand it correctly

  1. = ok
  2. I think we can only validate this once 3) is fixed
  3. Is actually a bug and should be fixed by Asrock Rack (with help of AMD and perhaps the IPMI-chip manufacturer Aspeed)
  4. Can only work / be fixed once 3) gets fixed
  5. Suggestions are welcome. Perhaps AMD can confirm if Zen2 properly supports this? But not like AMD TW claimed that “reporting is not supported”, which we now clearly proved to be false

I’ve send this information to Asrock Rack + AMD. Asrock Rack, on the same day, confirmed that they, together with AMD, had come to the exact same conclusion (using error injection in Linux) and that they asked AMD for assistance to report these errors in the IPMI as well. So hopefully we’ll someday get this important feature on this motherboard!!

In meantime, me and (especially) Diversity, are still trying to trigger (uncorrected) multi-bit errors as well. We’re in contact with the interesting folks of ECCploit for this, who have a very profound knowledge on this matter… (Check out Lucians talk on OffensiveCon19 https://www.youtube.com/watch?v=R2aPo_wwmZw).

Finally some real progress on this matter! Thanks to Diversity for getting my hopes up again, cause I almost gave up on this…

PFEH setting in BIOS
afbeelding
Screenshot taken after 1m33sec with memory overclocked / undervolted


screenshot after almost 2h after ending the run with memory overclocked / undervolted
afbeelding
And screenshot in Linux with memory overclocked / undervolted (during memtester run in the background):

3rd working method

This is how Asrock Rack and AMD tested it

The BIOS

Everything default except

  • “Platform First Error Handling” was changed from the default “Enabled” to “Disabled”
    afbeelding
  • “Disable Memory Error Injection”, strangely enough, was (accidently) set to the default “True”. I think Asrock Rack fell for their own double negation confusion [image] I haven’t tried it yet with set to false. I also haven’t retried Memtest86 error injection with these settings again…
    afbeelding

The OS

I used a fresh install of “Fedora-Server-dvd-x86_64-32-1.6.iso” for this. I might have selected a few additional package groups during the install, not sure if it will make a difference to the below instructions.

[root@localhost mce-inject-master]# cat /etc/fedora-release

Fedora release 32 (Thirty Two)

[root@localhost ~]# uname -r

5.6.8-300.fc32.x86_64

Installing / configuring additional packages / tools

edac-utils

[root@localhost ~]# yum install edac-utils

Fedora 32 openh264 (From Cisco) - x86_64                                                4.8 kB/s | 5.1 kB     00:01

Fedora Modular 32 - x86_64                                                              2.2 MB/s | 4.9 MB     00:02

Fedora Modular 32 - x86_64 - Updates                                                    881 kB/s | 1.4 MB     00:01

Fedora 32 - x86_64 - Updates                                                            4.1 MB/s | 7.8 MB     00:01

Fedora 32 - x86_64                                                                      4.3 MB/s |  70 MB     00:16

Dependencies resolved.

======================================================================================================================== Package                      Architecture             Version                           Repository                Size

========================================================================================================================Installing:

edac-utils                   x86_64                   0.16-22.fc32                      fedora                    49 k

Installing dependencies:

sysfsutils                   x86_64                   2.1.0-28.fc32                     fedora                    44 k

Transaction Summary

========================================================================================================================Install  2 Packages

Total download size: 93 k

Installed size: 238 k

Is this ok [y/N]: y

Downloading Packages:

(1/2): edac-utils-0.16-22.fc32.x86_64.rpm                                               406 kB/s |  49 kB     00:00

(2/2): sysfsutils-2.1.0-28.fc32.x86_64.rpm                                              337 kB/s |  44 kB     00:00

------------------------------------------------------------------------------------------------------------------------Total                                                                                   115 kB/s |  93 kB     00:00

warning: /var/cache/dnf/fedora-558931b5e76b51a7/packages/edac-utils-0.16-22.fc32.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 12c944d0: NOKEY

Fedora 32 - x86_64                                                                      1.6 MB/s | 1.6 kB     00:00

Importing GPG key 0x12C944D0:

Userid     : "Fedora (32) <[email protected]>"

Fingerprint: 97A1 AE57 C3A2 372C CA3A 4ABA 6C13 026D 12C9 44D0

From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-32-x86_64

Is this ok [y/N]: y

Key imported successfully

Running transaction check

Transaction check succeeded.

Running transaction test

Transaction test succeeded.

Running transaction

  Preparing        :                                                                                                1/1

  Installing       : sysfsutils-2.1.0-28.fc32.x86_64                                                                1/2

  Installing       : edac-utils-0.16-22.fc32.x86_64                                                                 2/2

  Running scriptlet: edac-utils-0.16-22.fc32.x86_64                                                                 2/2

  Verifying        : edac-utils-0.16-22.fc32.x86_64                                                                 1/2

  Verifying        : sysfsutils-2.1.0-28.fc32.x86_64                                                                2/2

Installed:

  edac-utils-0.16-22.fc32.x86_64                             sysfsutils-2.1.0-28.fc32.x86_64

Complete!

Bison

[root@localhost mce-inject-master]# yum install bison

Last metadata expiration check: 0:06:26 ago on Fri 08 May 2020 12:45:14 AM CEST.

Dependencies resolved.

=============================================================================================================================================================================================================================================

Package                                               Architecture                                           Version                                                           Repository                                              Size

=============================================================================================================================================================================================================================================

Installing:

bison                                                 x86_64                                                 3.5-2.fc32                                                        fedora                                                 818 k

Installing dependencies:

m4                                                    x86_64                                                 1.4.18-12.fc32                                                    fedora                                                 218 k

Transaction Summary

=============================================================================================================================================================================================================================================

Install  2 Packages

Total download size: 1.0 M

Installed size: 3.0 M

Is this ok [y/N]: y

Downloading Packages:

(1/2): m4-1.4.18-12.fc32.x86_64.rpm                                                                                                                                                                          946 kB/s | 218 kB     00:00

(2/2): bison-3.5-2.fc32.x86_64.rpm                                                                                                                                                                           2.0 MB/s | 818 kB     00:00

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Total                                                                                                                                                                                                        1.2 MB/s | 1.0 MB     00:00

Running transaction check

Transaction check succeeded.

Running transaction test

Transaction test succeeded.

Running transaction

  Preparing        :                                                                                                                                                                                                                     1/1

  Installing       : m4-1.4.18-12.fc32.x86_64                                                                                                                                                                                            1/2

  Installing       : bison-3.5-2.fc32.x86_64                                                                                                                                                                                             2/2

  Running scriptlet: bison-3.5-2.fc32.x86_64                                                                                                                                                                                             2/2

  Verifying        : bison-3.5-2.fc32.x86_64                                                                                                                                                                                             1/2

  Verifying        : m4-1.4.18-12.fc32.x86_64                                                                                                                                                                                            2/2

Installed:

  bison-3.5-2.fc32.x86_64                                                                                              m4-1.4.18-12.fc32.x86_64

Complete!

Flex

[root@localhost mce-inject-master]# yum install flex

Last metadata expiration check: 0:06:41 ago on Fri 08 May 2020 12:45:14 AM CEST.

Dependencies resolved.

=============================================================================================================================================================================================================================================

Package                                               Architecture                                            Version                                                         Repository                                               Size

=============================================================================================================================================================================================================================================

Installing:

flex                                                  x86_64                                                  2.6.4-4.fc32                                                    fedora                                                  318 k

Transaction Summary

=============================================================================================================================================================================================================================================

Install  1 Package

Total download size: 318 k

Installed size: 927 k

Is this ok [y/N]: y

Downloading Packages:

flex-2.6.4-4.fc32.x86_64.rpm                                                                                                                                                                                 1.5 MB/s | 318 kB     00:00

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Total                                                                                                                                                                                                        482 kB/s | 318 kB     00:00

Running transaction check

Transaction check succeeded.

Running transaction test

Transaction test succeeded.

Running transaction

  Preparing        :                                                                                                                                                                                                                     1/1

  Installing       : flex-2.6.4-4.fc32.x86_64                                                                                                                                                                                            1/1

  Running scriptlet: flex-2.6.4-4.fc32.x86_64                                                                                                                                                                                            1/1

  Verifying        : flex-2.6.4-4.fc32.x86_64                                                                                                                                                                                            1/1

Installed:

  flex-2.6.4-4.fc32.x86_64

Complete!

Rasdaemon

[root@localhost mce-inject-master]# yum install rasdaemon

Last metadata expiration check: 0:02:51 ago on Fri 08 May 2020 12:52:01 AM CEST.

Dependencies resolved.

=============================================================================================================================================================================================================================================

Package                                                       Architecture                                       Version                                                           Repository                                          Size

=============================================================================================================================================================================================================================================

Installing:

rasdaemon                                                     x86_64                                             0.6.4-1.fc32                                                      fedora                                             117 k

Installing dependencies:

perl-DBD-SQLite                                               x86_64                                             1.64-4.fc32                                                       fedora                                             196 k

perl-DBI                                                      x86_64                                             1.643-2.fc32                                                      fedora                                             707 k

perl-Math-BigInt                                              noarch                                             1:1.9998.18-2.fc32                                                fedora                                             190 k

perl-Math-Complex                                             noarch                                             1.59-452.fc32                                                     fedora                                              56 k

Transaction Summary

=============================================================================================================================================================================================================================================

Install  5 Packages

Total download size: 1.2 M

Installed size: 3.5 M

Is this ok [y/N]: y

Downloading Packages:

(1/5): perl-Math-BigInt-1.9998.18-2.fc32.noarch.rpm                                                                                                                                                          497 kB/s | 190 kB     00:00

(2/5): perl-DBD-SQLite-1.64-4.fc32.x86_64.rpm                                                                                                                                                                498 kB/s | 196 kB     00:00

(3/5): perl-Math-Complex-1.59-452.fc32.noarch.rpm                                                                                                                                                            1.3 MB/s |  56 kB     00:00

(4/5): rasdaemon-0.6.4-1.fc32.x86_64.rpm                                                                                                                                                                     1.1 MB/s | 117 kB     00:00

(5/5): perl-DBI-1.643-2.fc32.x86_64.rpm                                                                                                                                                                      1.1 MB/s | 707 kB     00:00

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Total                                                                                                                                                                                                        1.1 MB/s | 1.2 MB     00:01

Running transaction check

Transaction check succeeded.

Running transaction test

Transaction test succeeded.

Running transaction

  Preparing        :                                                                                                                                                                                                                     1/1

  Installing       : perl-Math-Complex-1.59-452.fc32.noarch                                                                                                                                                                              1/5

  Installing       : perl-Math-BigInt-1:1.9998.18-2.fc32.noarch                                                                                                                                                                          2/5

  Installing       : perl-DBI-1.643-2.fc32.x86_64                                                                                                                                                                                        3/5

  Installing       : perl-DBD-SQLite-1.64-4.fc32.x86_64                                                                                                                                                                                  4/5

  Installing       : rasdaemon-0.6.4-1.fc32.x86_64                                                                                                                                                                                       5/5

  Running scriptlet: rasdaemon-0.6.4-1.fc32.x86_64                                                                                                                                                                                       5/5

  Verifying        : perl-DBD-SQLite-1.64-4.fc32.x86_64                                                                                                                                                                                  1/5

  Verifying        : perl-DBI-1.643-2.fc32.x86_64                                                                                                                                                                                        2/5

  Verifying        : perl-Math-BigInt-1:1.9998.18-2.fc32.noarch                                                                                                                                                                          3/5

  Verifying        : perl-Math-Complex-1.59-452.fc32.noarch                                                                                                                                                                              4/5

  Verifying        : rasdaemon-0.6.4-1.fc32.x86_64                                                                                                                                                                                       5/5

Installed:

  perl-DBD-SQLite-1.64-4.fc32.x86_64             perl-DBI-1.643-2.fc32.x86_64             perl-Math-BigInt-1:1.9998.18-2.fc32.noarch             perl-Math-Complex-1.59-452.fc32.noarch             rasdaemon-0.6.4-1.fc32.x86_64

Complete!

[root@localhost machinecheck0]# rasdaemon -e

rasdaemon: ras:mc_event event enabled

rasdaemon: ras:aer_event event enabled

rasdaemon: mce:mce_record event enabled

rasdaemon: Can't write to set_event

rasdaemon: devlink:devlink_health_report event enabled

rasdaemon: block:block_rq_complete event enabled

[root@localhost machinecheck0]# systemctl start rasdaemon

[root@localhost machinecheck0]# systemctl enable rasdaemon

Created symlink /etc/systemd/system/multi-user.target.wants/rasdaemon.service → /usr/lib/systemd/system/rasdaemon.service.

[root@localhost machinecheck0]# systemctl status rasdaemon.service

● rasdaemon.service - RAS daemon to log the RAS events

     Loaded: loaded (/usr/lib/systemd/system/rasdaemon.service; enabled; vendor preset: disabled)

     Active: active (running) since Fri 2020-05-08 00:57:46 CEST; 23s ago

   Main PID: 33914 (rasdaemon)

      Tasks: 1 (limit: 38389)

     Memory: 7.1M

        CPU: 10ms

     CGroup: /system.slice/rasdaemon.service

             └─33914 /usr/sbin/rasdaemon -f -r

May 08 00:57:46 localhost.localdomain rasdaemon[33914]: rasdaemon: diskerror_eventstore: 0x564510eb9918

May 08 00:57:46 localhost.localdomain rasdaemon[33914]: rasdaemon: register inserted at db

May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1360) ras:mc_event with new print handler

May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1357) ras:aer_event with new print handler

May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (114) mce:mce_record with new print handler

May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1441) net:net_dev_xmit_timeout with new print handler

May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1449) devlink:devlink_health_report with new print handler

May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1154) block:block_rq_complete with new print handler

May 08 00:57:46 localhost.localdomain rasdaemon[33914]: Calling ras_mc_event_opendb()

May 08 00:57:46 localhost.localdomain rasdaemon[33914]:            <...>-36    [005]     0.000095: block_rq_complete:    2020-05-08 00:57:45 +0200

Development Tools (for make)

[root@localhost mce-inject-master]# yum groupinstall "Development Tools"

Last metadata expiration check: 0:07:50 ago on Fri 08 May 2020 12:52:01 AM CEST.

Dependencies resolved.

=============================================================================================================================================================================================================================================

Package                                                                 Architecture                              Version                                                                  Repository                                  Size

=============================================================================================================================================================================================================================================

Installing group/module packages:

diffstat                                                                x86_64                                    1.63-2.fc32                                                              fedora                                      43 k

...

xorg-x11-server-utils                                                   x86_64                                    7.7-34.fc32                                                              fedora                                     188 k

Installing weak dependencies:

kernel-devel                                                            x86_64                                    5.6.8-300.fc32                                                           updates                                     13 M

Installing Groups:

Development Tools


Transaction Summary

=============================================================================================================================================================================================================================================

Install  79 Packages


Total download size: 124 M

Installed size: 448 M

Is this ok [y/N]: y

Downloading Packages:

(1/79): git-2.26.2-1.fc32.x86_64.rpm                                                                                                                                                                         787 kB/s | 126 kB     00:00

...

(79/79): xorg-x11-fonts-ISO8859-1-100dpi-7.5-24.fc32.noarch.rpm                                                                                                                                              2.6 MB/s | 1.0 MB     00:00

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Total                                                                                                                                                                                                        6.9 MB/s | 124 MB     00:18

Running transaction check

Transaction check succeeded.

Running transaction test

Transaction test succeeded.

Running transaction

  Preparing        :                                                                                                                                                                                                                     1/1

  Installing       : urw-base35-fonts-common-20170801-14.fc32.noarch                                                                                                                                                                    1/79

...

  Running scriptlet: diffstat-1.63-2.fc32.x86_64                                                                                                                                                                                       79/79

  Verifying        : cpp-10.0.1-0.14.fc32.x86_64                                                                                                                                                                                        1/79

...

  Verifying        : xorg-x11-server-utils-7.7-34.fc32.x86_64                                                                                                                                                                          79/79


Installed:

  adobe-mappings-cmap-20171205-7.fc32.noarch                    adobe-mappings-cmap-deprecated-20171205-7.fc32.noarch  adobe-mappings-pdf-20180407-5.fc32.noarch                   binutils-2.34-2.fc32.x86_64

  binutils-gold-2.34-2.fc32.x86_64                              boost-filesystem-1.69.0-15.fc32.x86_64                 boost-system-1.69.0-15.fc32.x86_64                          boost-thread-1.69.0-15.fc32.x86_64

  cpp-10.0.1-0.14.fc32.x86_64                                   diffstat-1.63-2.fc32.x86_64                            doxygen-1:1.8.17-2.fc32.x86_64                              dyninst-10.1.0-5.fc32.x86_64

  gcc-10.0.1-0.14.fc32.x86_64                                   gd-2.3.0-1.fc32.x86_64                                 git-2.26.2-1.fc32.x86_64                                    git-core-2.26.2-1.fc32.x86_64

  git-core-doc-2.26.2-1.fc32.noarch                             glibc-devel-2.31-2.fc32.x86_64                         glibc-headers-2.31-2.fc32.x86_64                            google-droid-sans-fonts-20200215-3.fc32.noarch

  graphviz-2.42.4-1.fc32.x86_64                                 gtk2-2.24.32-7.fc32.x86_64                             gts-0.7.6-37.20121130.fc32.x86_64                           guile22-2.2.6-4.fc32.x86_64

  isl-0.16.1-10.fc32.x86_64                                     jbig2dec-libs-0.17-4.fc32.x86_64                       kernel-devel-5.6.8-300.fc32.x86_64                          kernel-headers-5.6.7-300.fc32.x86_64

  lasi-1.1.3-2.fc32.x86_64                                      libXaw-1.0.13-14.fc32.x86_64                           libXmu-1.1.3-3.fc32.x86_64                                  libXpm-3.5.13-2.fc32.x86_64

  libXt-1.2.0-1.fc32.x86_64                                     libfontenc-1.1.3-12.fc32.x86_64                        libgs-9.52-1.fc32.x86_64                                    libidn-1.35-7.fc32.x86_64

  libijs-0.35-11.fc32.x86_64                                    libimagequant-2.12.6-2.fc32.x86_64                     libmcpp-2.7.2-25.fc32.x86_64                                libmpc-1.1.0-8.fc32.x86_64

  libpaper-1.1.24-26.fc32.x86_64                                libraqm-0.7.0-5.fc32.x86_64                            librsvg2-2.48.4-1.fc32.x86_64                               libserf-1.3.9-15.fc32.x86_64

  libwebp-1.1.0-2.fc32.x86_64                                   libxcrypt-devel-4.4.16-3.fc32.x86_64                   make-1:4.2.1-16.fc32.x86_64                                 mcpp-2.7.2-25.fc32.x86_64

  netpbm-10.90.00-1.fc32.x86_64                                 openjpeg2-2.3.1-6.fc32.x86_64                          patch-2.7.6-12.fc32.x86_64                                  patchutils-0.3.4-15.fc32.x86_64

  perl-Error-1:0.17029-1.fc32.noarch                            perl-Git-2.26.2-1.fc32.noarch                          perl-TermReadKey-2.38-6.fc32.x86_64                         subversion-1.12.2-7.fc32.x86_64

  subversion-libs-1.12.2-7.fc32.x86_64                          systemtap-4.3-0.20200211git91ffb97ad335.fc32.x86_64    systemtap-client-4.3-0.20200211git91ffb97ad335.fc32.x86_64  systemtap-devel-4.3-0.20200211git91ffb97ad335.fc32.x86_64

  systemtap-runtime-4.3-0.20200211git91ffb97ad335.fc32.x86_64   tbb-2020.2-1.fc32.x86_64                               urw-base35-bookman-fonts-20170801-14.fc32.noarch            urw-base35-c059-fonts-20170801-14.fc32.noarch

  urw-base35-d050000l-fonts-20170801-14.fc32.noarch             urw-base35-fonts-20170801-14.fc32.noarch               urw-base35-fonts-common-20170801-14.fc32.noarch             urw-base35-gothic-fonts-20170801-14.fc32.noarch

  urw-base35-nimbus-mono-ps-fonts-20170801-14.fc32.noarch       urw-base35-nimbus-roman-fonts-20170801-14.fc32.noarch  urw-base35-nimbus-sans-fonts-20170801-14.fc32.noarch        urw-base35-p052-fonts-20170801-14.fc32.noarch

  urw-base35-standard-symbols-ps-fonts-20170801-14.fc32.noarch  urw-base35-z003-fonts-20170801-14.fc32.noarch          utf8proc-2.4.0-3.fc32.x86_64                                xapian-core-libs-1.4.14-1.fc32.x86_64

  xorg-x11-font-utils-1:7.5-44.fc32.x86_64                      xorg-x11-fonts-ISO8859-1-100dpi-7.5-24.fc32.noarch     xorg-x11-server-utils-7.7-34.fc32.x86_64


Complete!

mce-inject

[root@localhost ~]# wget https://github.com/andikleen/mce-inject/archive/master.zip

--2020-05-08 00:49:09--  https://github.com/andikleen/mce-inject/archive/master.zip

Resolving github.com (github.com)... 140.82.118.3

Connecting to github.com (github.com)|140.82.118.3|:443... connected.

HTTP request sent, awaiting response... 302 Found

Location: https://codeload.github.com/andikleen/mce-inject/zip/master [following]

--2020-05-08 00:49:09--  https://codeload.github.com/andikleen/mce-inject/zip/master

Resolving codeload.github.com (codeload.github.com)... 140.82.114.9

Connecting to codeload.github.com (codeload.github.com)|140.82.114.9|:443... connected.

HTTP request sent, awaiting response... 200 OK

Length: unspecified [application/zip]

Saving to: ‘master.zip’


master.zip                        [ <=>                                              ]  13.21K  --.-KB/s    in 0.09s


2020-05-08 00:49:10 (139 KB/s) - ‘master.zip’ saved [13530]


[root@localhost ~]# unzip master.zip

Archive:  master.zip

4cbe46321b4a81365ff3aafafe63967264dbfec5

   creating: mce-inject-master/

  inflating: mce-inject-master/Makefile

  inflating: mce-inject-master/README

  inflating: mce-inject-master/inject.h

  inflating: mce-inject-master/mce-inject.8

  inflating: mce-inject-master/mce-inject.c

  inflating: mce-inject-master/mce.h

  inflating: mce-inject-master/mce.lex

  inflating: mce-inject-master/mce.y

  inflating: mce-inject-master/parser.h

   creating: mce-inject-master/test/

  inflating: mce-inject-master/test/corrected

  inflating: mce-inject-master/test/fatal

  inflating: mce-inject-master/test/uncorrected

  inflating: mce-inject-master/util.c

  inflating: mce-inject-master/util.h

[root@localhost ~]# cd mce-inject-master/

[root@localhost mce-inject-master]# ls -la

total 48

drwxr-xr-x. 3 root root  189 Jan 19  2013 .

drwxr-xr-x. 3 root root   49 May  8 00:49 ..

-rw-r--r--. 1 root root  193 Jan 19  2013 inject.h

-rw-r--r--. 1 root root  904 Jan 19  2013 Makefile

-rw-r--r--. 1 root root 3863 Jan 19  2013 mce.h

-rw-r--r--. 1 root root 3793 Jan 19  2013 mce-inject.8

-rw-r--r--. 1 root root 6506 Jan 19  2013 mce-inject.c

-rw-r--r--. 1 root root 3487 Jan 19  2013 mce.lex

-rw-r--r--. 1 root root 3822 Jan 19  2013 mce.y

-rw-r--r--. 1 root root  385 Jan 19  2013 parser.h

-rw-r--r--. 1 root root 1460 Jan 19  2013 README

drwxr-xr-x. 2 root root   55 Jan 19  2013 test

-rw-r--r--. 1 root root  364 Jan 19  2013 util.c

-rw-r--r--. 1 root root  290 Jan 19  2013 util.h


[root@localhost mce-inject-master]# make

bison -d mce.y

flex mce.lex

cc -MM -DDEPS_RUN -I. mce-inject.c util.c mce.tab.c lex.yy.c > .depend.X && \

        mv .depend.X .depend

cc -Os -g -Wall   -c -o mce-inject.o mce-inject.c

cc -Os -g -Wall   -c -o mce.tab.o mce.tab.c

cc -Os -g -Wall   -c -o lex.yy.o lex.yy.c

cc -Os -g -Wall   -c -o util.o util.c

cc -pthread  mce-inject.o mce.tab.o lex.yy.o util.o   -o mce-inject

[root@localhost mce-inject-master]# ls -la

total 400

drwxr-xr-x. 3 root root  4096 May  8 01:01 .

drwxr-xr-x. 3 root root    49 May  8 00:49 ..

-rw-r--r--. 1 root root    45 May  8 00:54 correct

-rw-r--r--. 1 root root   185 May  8 01:01 .depend

-rw-r--r--. 1 root root   193 Jan 19  2013 inject.h

-rw-r--r--. 1 root root 47534 May  8 01:01 lex.yy.c

-rw-r--r--. 1 root root 73320 May  8 01:01 lex.yy.o

-rw-r--r--. 1 root root   904 Jan 19  2013 Makefile

-rw-r--r--. 1 root root  3863 Jan 19  2013 mce.h

-rwxr-xr-x. 1 root root 84584 May  8 01:01 mce-inject

-rw-r--r--. 1 root root  3793 Jan 19  2013 mce-inject.8

-rw-r--r--. 1 root root  6506 Jan 19  2013 mce-inject.c

-rw-r--r--. 1 root root 38960 May  8 01:01 mce-inject.o

-rw-r--r--. 1 root root  3487 Jan 19  2013 mce.lex

-rw-r--r--. 1 root root 56619 May  8 01:01 mce.tab.c

-rw-r--r--. 1 root root  2922 May  8 01:01 mce.tab.h

-rw-r--r--. 1 root root 25552 May  8 01:01 mce.tab.o

-rw-r--r--. 1 root root  3822 Jan 19  2013 mce.y

-rw-r--r--. 1 root root   385 Jan 19  2013 parser.h

-rw-r--r--. 1 root root  1460 Jan 19  2013 README

drwxr-xr-x. 2 root root    55 Jan 19  2013 test

-rw-r--r--. 1 root root   364 Jan 19  2013 util.c

-rw-r--r--. 1 root root   290 Jan 19  2013 util.h

-rw-r--r--. 1 root root  8128 May  8 01:01 util.o

[root@localhost mce-inject-master]# modprobe mce_inject

[root@localhost mce-inject-master]# vi correct

[root@localhost mce-inject-master]# cat correct

CPU 1 BANK 2

STATUS corrected

RIP 0x12341234

Prevent the machine from crashing

[root@localhost mce-inject-master]# cd /sys/devices/system/machinecheck/machinecheck0

[root@localhost machinecheck0]# cat tolerant

1

[root@localhost machinecheck0]# vi tolerant

[root@localhost machinecheck0]# cat tolerant

3

[root@localhost machinecheck0]# 

Check edac status

[root@localhost ~]# ls /sys/devices/system/edac/mc

mc0  power  subsystem  uevent

[root@localhost ~]# find /lib/modules/$(uname -r) -name '*edac*'

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/amd64_edac_mod.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/e752x_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/edac_mce_amd.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i10nm_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i3000_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i3200_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i5000_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i5100_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i5400_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i7300_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i7core_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i82975x_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/ie31200_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/pnd2_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/sb_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/skx_edac.ko.xz

/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/x38_edac.ko.xz

[root@localhost ~]# edac-util -rfull

mc0:csrow2:mc#0csrow#2channel#0:CE:0

mc0:csrow2:mc#0csrow#2channel#1:CE:0

mc0:csrow3:mc#0csrow#3channel#0:CE:0

mc0:csrow3:mc#0csrow#3channel#1:CE:0

mc0:noinfo:all:UE:0

mc0:noinfo:all:CE:0

Inject the error and observe the result

[CODE][root@localhost mce-inject-master]# modprobe mce_inject

[root@localhost mce-inject-master]# ./mce-inject correct

[root@localhost mce-inject-master]#

Message from syslogd@localhost at May  8 01:02:10 ...

kernel:[Hardware Error]: Corrected error, no action required.


Message from syslogd@localhost at May  8 01:02:10 ...

kernel:[Hardware Error]: CPU:1 (17:71:0) MC2_STATUS[-|CE|-|-|-|-|-|-|-|-]: 0x9000000000000000


Message from syslogd@localhost at May  8 01:02:10 ...

kernel:[Hardware Error]: IPID: 0x0000000000000000


Message from syslogd@localhost at May  8 01:02:10 ...

kernel:[Hardware Error]: L2 Cache Ext. Error Code: 0, L2M Tag Multiple-Way-Hit error.


Message from syslogd@localhost at May  8 01:02:10 ...

kernel:[Hardware Error]: cache level: RESV, tx: INSN


[root@localhost mce-inject-master]# ras-mc-ctl --summary

No Memory errors.


No PCIe AER errors.


No Extlog errors.


No devlink errors.

Disk errors summary:

        0:0 has 1 errors

MCE records summary:

        1 Corrected error, no action required. errors

[root@localhost mce-inject-master]#
2 Likes

Thanks, that was a good read.
I also was digging a little into how ECC errors should be propagated and there are a few things:

  • ECC errors are reported with Machine Check Exceptions (MCE). Those exceptions are essentially just an event when the CPU populates MCA registers. From what I gathered the kernel MCA handler periodically polls for a change in those registers and will report any errors (or panic for example). I also read that MCEs are essentially interrupts similar to NMIs so I am not sure how it goes with the polling strategy.

  • AGESA decides if ECC should be enabled on a specific CPU and depending on that the BIOS can allow the OS/kernel to register a MCA handler.
    (old agesa code: https://github.com/coreboot/coreboot/tree/master/src/vendorcode/amd/agesa)


Now the most interesting part:

  • AFAIK BMC and the host OS do not talk with each other when it comes to MCEs. I contacted ASPEED support and they informed me that normally MCEs are reported to BMC chip through APML which is an I2C bus. (on AMD CPUs)
    According to the APML spec the CPU exports the same set of MCA registers to the BMC as it does to the host OS. So when it comes to MCAs host OS and IPMI detect/log/report them independently.
    AMD docs: https://developer.amd.com/resources/developer-guides-manuals/
    APML spec: https://developer.amd.com/wordpress/media/2012/10/41918.pdf
    It could be that APML is simply disabled for AM4 CPUs - I can’t find a definitive info but most of the AMPL marketing seems to point to EPYC exclusivity. At the same time its only 2 CPU pins (SIC and SID) and even AM2 had them:
    https://en.wikichip.org/wiki/amd/packages/socket_am2
    If this is true and we are simply missing APML link then not only ECC errors should be missing in the IPMI logs. Critical temperature events are also reported through this bus. Anyone saw overhat events in the IPMI log?
    It is also possible that X470D4U simply did not wire those 2 pins

  • A reminder about mce-inject: Looking at the documentation - it is only for testing of MCA handlers in kernel - it is not for testing the whole platform. Be sure not to rely on it when it comes to simulating real ECC errors. The mce-inject code seems to suggest it is using EDAC driver inject points. You can find more info here: https://www.kernel.org/doc/html/latest/admin-guide/ras.html#edac-error-detection-and-correction
    I believe ECC inject in the BIOS is a separate feature.
    Edit: According to the previous arch BKDG (http://support.amd.com/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf) EDAC driver can support error injection using dedicated CPU registers. Don’t know if logs are different when compared to injecting errors just on kernel driver level.

  • The host OS can communicate with IPMI (/dev/ipmi) so it should be possible to for example bypass the APML requirement in the MCA handler and communicate with IPMI directly (with “ipmitool event” like mechanism). I do not believe something like this is being done currently (at least not in the MCA handler kernel code). So I think what you wrote about OS forwarding is simply not supposed to happen now:

1 Like


12 boards available at the time of writing this

Edit: Price seems to be going up, it started at ~247 Euro
Edit2: another 12 boards at itboost.de
Edit3: Aaand It’s gone

Edit4: The stock seems more stable now - several retailers list it and those that sold-out few days ago seem to be getting more units.