did you try disconnecting all power
holding the power button 5-10 seconds
then reseating the CPU and putting it all back together?
did you try disconnecting all power
holding the power button 5-10 seconds
then reseating the CPU and putting it all back together?
Umm, I think those are SO-DIMMS, did you mean M391A4G43MB1-CTDQ or are you using some of those weird SO-DIMM>DIMM risers/converters?
Also: Are you using correct memory slots? for 2 sticks that would be A1 and B1.
+Check lower frequency - for example 2400 Mhz for a single stick in A1 slot. (set frequency to 1200Mhz in the OC menu)
Hmm, @RoLee has similar sounding issue with 3900X
It could be a power issue as @nx2l already suggested but if it is something about problems with high power chips + this board then there probably isnât much I can suggest.
At the same time just booting shouldnât draw much power. -.-
Maybe try re-seating the CPU?
About the drive: which m.2 slot? 970 pro is a pcie drive so maybe make sure you are using M2_1 slot. M2_2 is supposed to automatically detect SATA/M.2 drive and switch. Maybe that is flaky.
Thatâs exactly what it was doing. Itâs no longer an issue after repairing the BMC, was just trying to help @beryl.
what version did you flash to?
Can you access the ipmi now?
Sorry, I copy/pasted the wrong P/N.
Iâm using M391A4G43MB1-CTDQ, straight off the X470D4U QVL list.
They work fine and pass memtest86 in another board.
I donât think thatâs it. If I can get the board to boot, it seems to continue on just fine for a while.
Iâve tried all combinations of no drive at all, either slot, and even a separate M.2 to PCIe adapter card. No difference.
I made some small progress this morning: I was able to boot into the BIOS and disable wait for BMC. Now it boots every single time and I can get into the ESXi installer, no problem.
However, after the initial ESXi loading phase, the board just reboots. Same spot every time.
Also, my two-digit LED diagnostic display has an LED that wonât illuminate. I wonder if this board has some solder connectivity issues.
Whatever is the latest as of today.
Yes, that resolved all issues.
I just tried ESXi 7 Installation iso and it did not reboot after loading - I went to the point of choosing the installation drive.
(https://my.vmware.com/web/vmware/evalcenter?p=free-esxi7)
Thats with Ryzen 2600, 2 sticks of KSM26ED8/16ME and a samsung 960 EVO or Intel 760p, no expansion cards, default BIOS settings, I just disabled all VMedia in IPMI
So I guess its rather low probability that ESXi iso is having some issues with board-specific hardware.
Like at all? Nothing even when you manage to POST?
Didnât you just disable the LEDs? The switch is in the BIOS>Advanced>Chipset Configuration>Onboard Debug Port LED
AFAIK It should be enabled by default (set to auto). And if it is disabled in the BIOS without you knowing then I would strongly suggest you reset BIOS to default settings
Thanks for the hint. I tried but it had no effect.
I also disabled any c states settings in my bios when I was experiencing similar issues
The LED display mostly works.
It just has one LED that doesnât light up, so 6 looks like h, 3 looks like a backward f and so on.
Anyway, I finally recovered the BMC. For anyone else with the same problem, the trick is to download socflash from the Asrack Rack website (I canât include links yet, look for the âHow to update the BMC FW locally in DOS?â section) and copy it to a FreeDOS bootable USB drive along with the BMC firmware. Follow the instructions at the link.
I then reset the BIOS by removing the battery and jumping the BIOS reset jumper near the corner of the board just to be safe.
So far, everything appears to be working. ESXi 7 finally installed cleanly. Iâll run stress tests for the next 24 hours to confirm that everything is okay.
I run ESXi 7 with no issues.
It seems I have finally found a working configuration. These are the BIOS settings I changed. Now I have to check one by one which ones are really neded.
Thanks again for the help!
@RoLee What PSU/PSUs did you use when you were experiencing issues?
Maybe we are seeing something similar to 2013 Intel Hasswell situation here:
At that time many PSUs started appearing with âHaswell-readyâ sticker
Or itâs just what @Mastic_Warrior mentioned in this old thread: Ryzen C-State related problems -- what is the root cause?
@Tenrag Hmm, interesting.
Currently it is running with a "Seasonic Prime Ultra Platinum 550W ATX 2.4 (SSR-550PD2)"
/seasonic.com/prime-ultra-platinum
The other model I tested it with was an "Enermax Modu 82+ (EMD425AWT)"
/www.enermax.com/home.php?fn=eng/product_a1_1_1&lv0=1&lv1=54&no=7
Drop a support email to Seasonic and ASRock Rack, they know about PSU compatibility issues with this motherboard (just search the first half of this thread), but I seem to be black-listed and never received a resolution regarding the issues I had reported.
In the meantime Iâm using SFX-L SilverStone Titanium PSUs, they donât trigger anything (so far).
With BMC version 1.90 it seems to have become much better but I still donât trust the X470D4Us with my Seasonic PSUs
Iâm happy to report that, after disabling âPlatform First Error Handling (PFEH)â in the BIOS, (corrected) single-bit are properly reported to the OS, also when overclocking / undervolting! So Iâm now getting the same results as Diversity with his memory pin shorting methodâŚ
The reason I was failing to detect this earlier was:
So in short:
So if I understand it correctly
Iâve send this information to Asrock Rack + AMD. Asrock Rack, on the same day, confirmed that they, together with AMD, had come to the exact same conclusion (using error injection in Linux) and that they asked AMD for assistance to report these errors in the IPMI as well. So hopefully weâll someday get this important feature on this motherboard!!
In meantime, me and (especially) Diversity, are still trying to trigger (uncorrected) multi-bit errors as well. Weâre in contact with the interesting folks of ECCploit for this, who have a very profound knowledge on this matter⌠(Check out Lucians talk on OffensiveCon19 https://www.youtube.com/watch?v=R2aPo_wwmZw).
Finally some real progress on this matter! Thanks to Diversity for getting my hopes up again, cause I almost gave up on thisâŚ
PFEH setting in BIOS
Screenshot taken after 1m33sec with memory overclocked / undervolted
3rd working method
This is how Asrock Rack and AMD tested it
The BIOS
Everything default except
The OS
I used a fresh install of âFedora-Server-dvd-x86_64-32-1.6.isoâ for this. I might have selected a few additional package groups during the install, not sure if it will make a difference to the below instructions.
[root@localhost mce-inject-master]# cat /etc/fedora-release
Fedora release 32 (Thirty Two)
[root@localhost ~]# uname -r
5.6.8-300.fc32.x86_64
Installing / configuring additional packages / tools
edac-utils
[root@localhost ~]# yum install edac-utils
Fedora 32 openh264 (From Cisco) - x86_64 4.8 kB/s | 5.1 kB 00:01
Fedora Modular 32 - x86_64 2.2 MB/s | 4.9 MB 00:02
Fedora Modular 32 - x86_64 - Updates 881 kB/s | 1.4 MB 00:01
Fedora 32 - x86_64 - Updates 4.1 MB/s | 7.8 MB 00:01
Fedora 32 - x86_64 4.3 MB/s | 70 MB 00:16
Dependencies resolved.
======================================================================================================================== Package Architecture Version Repository Size
========================================================================================================================Installing:
edac-utils x86_64 0.16-22.fc32 fedora 49 k
Installing dependencies:
sysfsutils x86_64 2.1.0-28.fc32 fedora 44 k
Transaction Summary
========================================================================================================================Install 2 Packages
Total download size: 93 k
Installed size: 238 k
Is this ok [y/N]: y
Downloading Packages:
(1/2): edac-utils-0.16-22.fc32.x86_64.rpm 406 kB/s | 49 kB 00:00
(2/2): sysfsutils-2.1.0-28.fc32.x86_64.rpm 337 kB/s | 44 kB 00:00
------------------------------------------------------------------------------------------------------------------------Total 115 kB/s | 93 kB 00:00
warning: /var/cache/dnf/fedora-558931b5e76b51a7/packages/edac-utils-0.16-22.fc32.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID 12c944d0: NOKEY
Fedora 32 - x86_64 1.6 MB/s | 1.6 kB 00:00
Importing GPG key 0x12C944D0:
Userid : "Fedora (32) <[email protected]>"
Fingerprint: 97A1 AE57 C3A2 372C CA3A 4ABA 6C13 026D 12C9 44D0
From : /etc/pki/rpm-gpg/RPM-GPG-KEY-fedora-32-x86_64
Is this ok [y/N]: y
Key imported successfully
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : sysfsutils-2.1.0-28.fc32.x86_64 1/2
Installing : edac-utils-0.16-22.fc32.x86_64 2/2
Running scriptlet: edac-utils-0.16-22.fc32.x86_64 2/2
Verifying : edac-utils-0.16-22.fc32.x86_64 1/2
Verifying : sysfsutils-2.1.0-28.fc32.x86_64 2/2
Installed:
edac-utils-0.16-22.fc32.x86_64 sysfsutils-2.1.0-28.fc32.x86_64
Complete!
Bison
[root@localhost mce-inject-master]# yum install bison
Last metadata expiration check: 0:06:26 ago on Fri 08 May 2020 12:45:14 AM CEST.
Dependencies resolved.
=============================================================================================================================================================================================================================================
Package Architecture Version Repository Size
=============================================================================================================================================================================================================================================
Installing:
bison x86_64 3.5-2.fc32 fedora 818 k
Installing dependencies:
m4 x86_64 1.4.18-12.fc32 fedora 218 k
Transaction Summary
=============================================================================================================================================================================================================================================
Install 2 Packages
Total download size: 1.0 M
Installed size: 3.0 M
Is this ok [y/N]: y
Downloading Packages:
(1/2): m4-1.4.18-12.fc32.x86_64.rpm 946 kB/s | 218 kB 00:00
(2/2): bison-3.5-2.fc32.x86_64.rpm 2.0 MB/s | 818 kB 00:00
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 1.2 MB/s | 1.0 MB 00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : m4-1.4.18-12.fc32.x86_64 1/2
Installing : bison-3.5-2.fc32.x86_64 2/2
Running scriptlet: bison-3.5-2.fc32.x86_64 2/2
Verifying : bison-3.5-2.fc32.x86_64 1/2
Verifying : m4-1.4.18-12.fc32.x86_64 2/2
Installed:
bison-3.5-2.fc32.x86_64 m4-1.4.18-12.fc32.x86_64
Complete!
Flex
[root@localhost mce-inject-master]# yum install flex
Last metadata expiration check: 0:06:41 ago on Fri 08 May 2020 12:45:14 AM CEST.
Dependencies resolved.
=============================================================================================================================================================================================================================================
Package Architecture Version Repository Size
=============================================================================================================================================================================================================================================
Installing:
flex x86_64 2.6.4-4.fc32 fedora 318 k
Transaction Summary
=============================================================================================================================================================================================================================================
Install 1 Package
Total download size: 318 k
Installed size: 927 k
Is this ok [y/N]: y
Downloading Packages:
flex-2.6.4-4.fc32.x86_64.rpm 1.5 MB/s | 318 kB 00:00
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 482 kB/s | 318 kB 00:00
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : flex-2.6.4-4.fc32.x86_64 1/1
Running scriptlet: flex-2.6.4-4.fc32.x86_64 1/1
Verifying : flex-2.6.4-4.fc32.x86_64 1/1
Installed:
flex-2.6.4-4.fc32.x86_64
Complete!
Rasdaemon
[root@localhost mce-inject-master]# yum install rasdaemon
Last metadata expiration check: 0:02:51 ago on Fri 08 May 2020 12:52:01 AM CEST.
Dependencies resolved.
=============================================================================================================================================================================================================================================
Package Architecture Version Repository Size
=============================================================================================================================================================================================================================================
Installing:
rasdaemon x86_64 0.6.4-1.fc32 fedora 117 k
Installing dependencies:
perl-DBD-SQLite x86_64 1.64-4.fc32 fedora 196 k
perl-DBI x86_64 1.643-2.fc32 fedora 707 k
perl-Math-BigInt noarch 1:1.9998.18-2.fc32 fedora 190 k
perl-Math-Complex noarch 1.59-452.fc32 fedora 56 k
Transaction Summary
=============================================================================================================================================================================================================================================
Install 5 Packages
Total download size: 1.2 M
Installed size: 3.5 M
Is this ok [y/N]: y
Downloading Packages:
(1/5): perl-Math-BigInt-1.9998.18-2.fc32.noarch.rpm 497 kB/s | 190 kB 00:00
(2/5): perl-DBD-SQLite-1.64-4.fc32.x86_64.rpm 498 kB/s | 196 kB 00:00
(3/5): perl-Math-Complex-1.59-452.fc32.noarch.rpm 1.3 MB/s | 56 kB 00:00
(4/5): rasdaemon-0.6.4-1.fc32.x86_64.rpm 1.1 MB/s | 117 kB 00:00
(5/5): perl-DBI-1.643-2.fc32.x86_64.rpm 1.1 MB/s | 707 kB 00:00
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 1.1 MB/s | 1.2 MB 00:01
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : perl-Math-Complex-1.59-452.fc32.noarch 1/5
Installing : perl-Math-BigInt-1:1.9998.18-2.fc32.noarch 2/5
Installing : perl-DBI-1.643-2.fc32.x86_64 3/5
Installing : perl-DBD-SQLite-1.64-4.fc32.x86_64 4/5
Installing : rasdaemon-0.6.4-1.fc32.x86_64 5/5
Running scriptlet: rasdaemon-0.6.4-1.fc32.x86_64 5/5
Verifying : perl-DBD-SQLite-1.64-4.fc32.x86_64 1/5
Verifying : perl-DBI-1.643-2.fc32.x86_64 2/5
Verifying : perl-Math-BigInt-1:1.9998.18-2.fc32.noarch 3/5
Verifying : perl-Math-Complex-1.59-452.fc32.noarch 4/5
Verifying : rasdaemon-0.6.4-1.fc32.x86_64 5/5
Installed:
perl-DBD-SQLite-1.64-4.fc32.x86_64 perl-DBI-1.643-2.fc32.x86_64 perl-Math-BigInt-1:1.9998.18-2.fc32.noarch perl-Math-Complex-1.59-452.fc32.noarch rasdaemon-0.6.4-1.fc32.x86_64
Complete!
[root@localhost machinecheck0]# rasdaemon -e
rasdaemon: ras:mc_event event enabled
rasdaemon: ras:aer_event event enabled
rasdaemon: mce:mce_record event enabled
rasdaemon: Can't write to set_event
rasdaemon: devlink:devlink_health_report event enabled
rasdaemon: block:block_rq_complete event enabled
[root@localhost machinecheck0]# systemctl start rasdaemon
[root@localhost machinecheck0]# systemctl enable rasdaemon
Created symlink /etc/systemd/system/multi-user.target.wants/rasdaemon.service â /usr/lib/systemd/system/rasdaemon.service.
[root@localhost machinecheck0]# systemctl status rasdaemon.service
â rasdaemon.service - RAS daemon to log the RAS events
Loaded: loaded (/usr/lib/systemd/system/rasdaemon.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2020-05-08 00:57:46 CEST; 23s ago
Main PID: 33914 (rasdaemon)
Tasks: 1 (limit: 38389)
Memory: 7.1M
CPU: 10ms
CGroup: /system.slice/rasdaemon.service
ââ33914 /usr/sbin/rasdaemon -f -r
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: rasdaemon: diskerror_eventstore: 0x564510eb9918
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: rasdaemon: register inserted at db
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1360) ras:mc_event with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1357) ras:aer_event with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (114) mce:mce_record with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1441) net:net_dev_xmit_timeout with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1449) devlink:devlink_health_report with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: overriding event (1154) block:block_rq_complete with new print handler
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: Calling ras_mc_event_opendb()
May 08 00:57:46 localhost.localdomain rasdaemon[33914]: <...>-36 [005] 0.000095: block_rq_complete: 2020-05-08 00:57:45 +0200
Development Tools (for make)
[root@localhost mce-inject-master]# yum groupinstall "Development Tools"
Last metadata expiration check: 0:07:50 ago on Fri 08 May 2020 12:52:01 AM CEST.
Dependencies resolved.
=============================================================================================================================================================================================================================================
Package Architecture Version Repository Size
=============================================================================================================================================================================================================================================
Installing group/module packages:
diffstat x86_64 1.63-2.fc32 fedora 43 k
...
xorg-x11-server-utils x86_64 7.7-34.fc32 fedora 188 k
Installing weak dependencies:
kernel-devel x86_64 5.6.8-300.fc32 updates 13 M
Installing Groups:
Development Tools
Transaction Summary
=============================================================================================================================================================================================================================================
Install 79 Packages
Total download size: 124 M
Installed size: 448 M
Is this ok [y/N]: y
Downloading Packages:
(1/79): git-2.26.2-1.fc32.x86_64.rpm 787 kB/s | 126 kB 00:00
...
(79/79): xorg-x11-fonts-ISO8859-1-100dpi-7.5-24.fc32.noarch.rpm 2.6 MB/s | 1.0 MB 00:00
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total 6.9 MB/s | 124 MB 00:18
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : urw-base35-fonts-common-20170801-14.fc32.noarch 1/79
...
Running scriptlet: diffstat-1.63-2.fc32.x86_64 79/79
Verifying : cpp-10.0.1-0.14.fc32.x86_64 1/79
...
Verifying : xorg-x11-server-utils-7.7-34.fc32.x86_64 79/79
Installed:
adobe-mappings-cmap-20171205-7.fc32.noarch adobe-mappings-cmap-deprecated-20171205-7.fc32.noarch adobe-mappings-pdf-20180407-5.fc32.noarch binutils-2.34-2.fc32.x86_64
binutils-gold-2.34-2.fc32.x86_64 boost-filesystem-1.69.0-15.fc32.x86_64 boost-system-1.69.0-15.fc32.x86_64 boost-thread-1.69.0-15.fc32.x86_64
cpp-10.0.1-0.14.fc32.x86_64 diffstat-1.63-2.fc32.x86_64 doxygen-1:1.8.17-2.fc32.x86_64 dyninst-10.1.0-5.fc32.x86_64
gcc-10.0.1-0.14.fc32.x86_64 gd-2.3.0-1.fc32.x86_64 git-2.26.2-1.fc32.x86_64 git-core-2.26.2-1.fc32.x86_64
git-core-doc-2.26.2-1.fc32.noarch glibc-devel-2.31-2.fc32.x86_64 glibc-headers-2.31-2.fc32.x86_64 google-droid-sans-fonts-20200215-3.fc32.noarch
graphviz-2.42.4-1.fc32.x86_64 gtk2-2.24.32-7.fc32.x86_64 gts-0.7.6-37.20121130.fc32.x86_64 guile22-2.2.6-4.fc32.x86_64
isl-0.16.1-10.fc32.x86_64 jbig2dec-libs-0.17-4.fc32.x86_64 kernel-devel-5.6.8-300.fc32.x86_64 kernel-headers-5.6.7-300.fc32.x86_64
lasi-1.1.3-2.fc32.x86_64 libXaw-1.0.13-14.fc32.x86_64 libXmu-1.1.3-3.fc32.x86_64 libXpm-3.5.13-2.fc32.x86_64
libXt-1.2.0-1.fc32.x86_64 libfontenc-1.1.3-12.fc32.x86_64 libgs-9.52-1.fc32.x86_64 libidn-1.35-7.fc32.x86_64
libijs-0.35-11.fc32.x86_64 libimagequant-2.12.6-2.fc32.x86_64 libmcpp-2.7.2-25.fc32.x86_64 libmpc-1.1.0-8.fc32.x86_64
libpaper-1.1.24-26.fc32.x86_64 libraqm-0.7.0-5.fc32.x86_64 librsvg2-2.48.4-1.fc32.x86_64 libserf-1.3.9-15.fc32.x86_64
libwebp-1.1.0-2.fc32.x86_64 libxcrypt-devel-4.4.16-3.fc32.x86_64 make-1:4.2.1-16.fc32.x86_64 mcpp-2.7.2-25.fc32.x86_64
netpbm-10.90.00-1.fc32.x86_64 openjpeg2-2.3.1-6.fc32.x86_64 patch-2.7.6-12.fc32.x86_64 patchutils-0.3.4-15.fc32.x86_64
perl-Error-1:0.17029-1.fc32.noarch perl-Git-2.26.2-1.fc32.noarch perl-TermReadKey-2.38-6.fc32.x86_64 subversion-1.12.2-7.fc32.x86_64
subversion-libs-1.12.2-7.fc32.x86_64 systemtap-4.3-0.20200211git91ffb97ad335.fc32.x86_64 systemtap-client-4.3-0.20200211git91ffb97ad335.fc32.x86_64 systemtap-devel-4.3-0.20200211git91ffb97ad335.fc32.x86_64
systemtap-runtime-4.3-0.20200211git91ffb97ad335.fc32.x86_64 tbb-2020.2-1.fc32.x86_64 urw-base35-bookman-fonts-20170801-14.fc32.noarch urw-base35-c059-fonts-20170801-14.fc32.noarch
urw-base35-d050000l-fonts-20170801-14.fc32.noarch urw-base35-fonts-20170801-14.fc32.noarch urw-base35-fonts-common-20170801-14.fc32.noarch urw-base35-gothic-fonts-20170801-14.fc32.noarch
urw-base35-nimbus-mono-ps-fonts-20170801-14.fc32.noarch urw-base35-nimbus-roman-fonts-20170801-14.fc32.noarch urw-base35-nimbus-sans-fonts-20170801-14.fc32.noarch urw-base35-p052-fonts-20170801-14.fc32.noarch
urw-base35-standard-symbols-ps-fonts-20170801-14.fc32.noarch urw-base35-z003-fonts-20170801-14.fc32.noarch utf8proc-2.4.0-3.fc32.x86_64 xapian-core-libs-1.4.14-1.fc32.x86_64
xorg-x11-font-utils-1:7.5-44.fc32.x86_64 xorg-x11-fonts-ISO8859-1-100dpi-7.5-24.fc32.noarch xorg-x11-server-utils-7.7-34.fc32.x86_64
Complete!
mce-inject
[root@localhost ~]# wget https://github.com/andikleen/mce-inject/archive/master.zip
--2020-05-08 00:49:09-- https://github.com/andikleen/mce-inject/archive/master.zip
Resolving github.com (github.com)... 140.82.118.3
Connecting to github.com (github.com)|140.82.118.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/andikleen/mce-inject/zip/master [following]
--2020-05-08 00:49:09-- https://codeload.github.com/andikleen/mce-inject/zip/master
Resolving codeload.github.com (codeload.github.com)... 140.82.114.9
Connecting to codeload.github.com (codeload.github.com)|140.82.114.9|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: âmaster.zipâ
master.zip [ <=> ] 13.21K --.-KB/s in 0.09s
2020-05-08 00:49:10 (139 KB/s) - âmaster.zipâ saved [13530]
[root@localhost ~]# unzip master.zip
Archive: master.zip
4cbe46321b4a81365ff3aafafe63967264dbfec5
creating: mce-inject-master/
inflating: mce-inject-master/Makefile
inflating: mce-inject-master/README
inflating: mce-inject-master/inject.h
inflating: mce-inject-master/mce-inject.8
inflating: mce-inject-master/mce-inject.c
inflating: mce-inject-master/mce.h
inflating: mce-inject-master/mce.lex
inflating: mce-inject-master/mce.y
inflating: mce-inject-master/parser.h
creating: mce-inject-master/test/
inflating: mce-inject-master/test/corrected
inflating: mce-inject-master/test/fatal
inflating: mce-inject-master/test/uncorrected
inflating: mce-inject-master/util.c
inflating: mce-inject-master/util.h
[root@localhost ~]# cd mce-inject-master/
[root@localhost mce-inject-master]# ls -la
total 48
drwxr-xr-x. 3 root root 189 Jan 19 2013 .
drwxr-xr-x. 3 root root 49 May 8 00:49 ..
-rw-r--r--. 1 root root 193 Jan 19 2013 inject.h
-rw-r--r--. 1 root root 904 Jan 19 2013 Makefile
-rw-r--r--. 1 root root 3863 Jan 19 2013 mce.h
-rw-r--r--. 1 root root 3793 Jan 19 2013 mce-inject.8
-rw-r--r--. 1 root root 6506 Jan 19 2013 mce-inject.c
-rw-r--r--. 1 root root 3487 Jan 19 2013 mce.lex
-rw-r--r--. 1 root root 3822 Jan 19 2013 mce.y
-rw-r--r--. 1 root root 385 Jan 19 2013 parser.h
-rw-r--r--. 1 root root 1460 Jan 19 2013 README
drwxr-xr-x. 2 root root 55 Jan 19 2013 test
-rw-r--r--. 1 root root 364 Jan 19 2013 util.c
-rw-r--r--. 1 root root 290 Jan 19 2013 util.h
[root@localhost mce-inject-master]# make
bison -d mce.y
flex mce.lex
cc -MM -DDEPS_RUN -I. mce-inject.c util.c mce.tab.c lex.yy.c > .depend.X && \
mv .depend.X .depend
cc -Os -g -Wall -c -o mce-inject.o mce-inject.c
cc -Os -g -Wall -c -o mce.tab.o mce.tab.c
cc -Os -g -Wall -c -o lex.yy.o lex.yy.c
cc -Os -g -Wall -c -o util.o util.c
cc -pthread mce-inject.o mce.tab.o lex.yy.o util.o -o mce-inject
[root@localhost mce-inject-master]# ls -la
total 400
drwxr-xr-x. 3 root root 4096 May 8 01:01 .
drwxr-xr-x. 3 root root 49 May 8 00:49 ..
-rw-r--r--. 1 root root 45 May 8 00:54 correct
-rw-r--r--. 1 root root 185 May 8 01:01 .depend
-rw-r--r--. 1 root root 193 Jan 19 2013 inject.h
-rw-r--r--. 1 root root 47534 May 8 01:01 lex.yy.c
-rw-r--r--. 1 root root 73320 May 8 01:01 lex.yy.o
-rw-r--r--. 1 root root 904 Jan 19 2013 Makefile
-rw-r--r--. 1 root root 3863 Jan 19 2013 mce.h
-rwxr-xr-x. 1 root root 84584 May 8 01:01 mce-inject
-rw-r--r--. 1 root root 3793 Jan 19 2013 mce-inject.8
-rw-r--r--. 1 root root 6506 Jan 19 2013 mce-inject.c
-rw-r--r--. 1 root root 38960 May 8 01:01 mce-inject.o
-rw-r--r--. 1 root root 3487 Jan 19 2013 mce.lex
-rw-r--r--. 1 root root 56619 May 8 01:01 mce.tab.c
-rw-r--r--. 1 root root 2922 May 8 01:01 mce.tab.h
-rw-r--r--. 1 root root 25552 May 8 01:01 mce.tab.o
-rw-r--r--. 1 root root 3822 Jan 19 2013 mce.y
-rw-r--r--. 1 root root 385 Jan 19 2013 parser.h
-rw-r--r--. 1 root root 1460 Jan 19 2013 README
drwxr-xr-x. 2 root root 55 Jan 19 2013 test
-rw-r--r--. 1 root root 364 Jan 19 2013 util.c
-rw-r--r--. 1 root root 290 Jan 19 2013 util.h
-rw-r--r--. 1 root root 8128 May 8 01:01 util.o
[root@localhost mce-inject-master]# modprobe mce_inject
[root@localhost mce-inject-master]# vi correct
[root@localhost mce-inject-master]# cat correct
CPU 1 BANK 2
STATUS corrected
RIP 0x12341234
Prevent the machine from crashing
[root@localhost mce-inject-master]# cd /sys/devices/system/machinecheck/machinecheck0
[root@localhost machinecheck0]# cat tolerant
1
[root@localhost machinecheck0]# vi tolerant
[root@localhost machinecheck0]# cat tolerant
3
[root@localhost machinecheck0]#
Check edac status
[root@localhost ~]# ls /sys/devices/system/edac/mc
mc0 power subsystem uevent
[root@localhost ~]# find /lib/modules/$(uname -r) -name '*edac*'
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/amd64_edac_mod.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/e752x_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/edac_mce_amd.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i10nm_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i3000_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i3200_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i5000_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i5100_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i5400_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i7300_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i7core_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/i82975x_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/ie31200_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/pnd2_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/sb_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/skx_edac.ko.xz
/lib/modules/5.6.8-300.fc32.x86_64/kernel/drivers/edac/x38_edac.ko.xz
[root@localhost ~]# edac-util -rfull
mc0:csrow2:mc#0csrow#2channel#0:CE:0
mc0:csrow2:mc#0csrow#2channel#1:CE:0
mc0:csrow3:mc#0csrow#3channel#0:CE:0
mc0:csrow3:mc#0csrow#3channel#1:CE:0
mc0:noinfo:all:UE:0
mc0:noinfo:all:CE:0
Inject the error and observe the result
[CODE][root@localhost mce-inject-master]# modprobe mce_inject
[root@localhost mce-inject-master]# ./mce-inject correct
[root@localhost mce-inject-master]#
Message from syslogd@localhost at May 8 01:02:10 ...
kernel:[Hardware Error]: Corrected error, no action required.
Message from syslogd@localhost at May 8 01:02:10 ...
kernel:[Hardware Error]: CPU:1 (17:71:0) MC2_STATUS[-|CE|-|-|-|-|-|-|-|-]: 0x9000000000000000
Message from syslogd@localhost at May 8 01:02:10 ...
kernel:[Hardware Error]: IPID: 0x0000000000000000
Message from syslogd@localhost at May 8 01:02:10 ...
kernel:[Hardware Error]: L2 Cache Ext. Error Code: 0, L2M Tag Multiple-Way-Hit error.
Message from syslogd@localhost at May 8 01:02:10 ...
kernel:[Hardware Error]: cache level: RESV, tx: INSN
[root@localhost mce-inject-master]# ras-mc-ctl --summary
No Memory errors.
No PCIe AER errors.
No Extlog errors.
No devlink errors.
Disk errors summary:
0:0 has 1 errors
MCE records summary:
1 Corrected error, no action required. errors
[root@localhost mce-inject-master]#
Thanks, that was a good read.
I also was digging a little into how ECC errors should be propagated and there are a few things:
ECC errors are reported with Machine Check Exceptions (MCE). Those exceptions are essentially just an event when the CPU populates MCA registers. From what I gathered the kernel MCA handler periodically polls for a change in those registers and will report any errors (or panic for example). I also read that MCEs are essentially interrupts similar to NMIs so I am not sure how it goes with the polling strategy.
AGESA decides if ECC should be enabled on a specific CPU and depending on that the BIOS can allow the OS/kernel to register a MCA handler.
(old agesa code: https://github.com/coreboot/coreboot/tree/master/src/vendorcode/amd/agesa)
A reminder about mce-inject: Looking at the documentation - it is only for testing of MCA handlers in kernel - it is not for testing the whole platform. Be sure not to rely on it when it comes to simulating real ECC errors. The mce-inject code seems to suggest it is using EDAC driver inject points. You can find more info here: https://www.kernel.org/doc/html/latest/admin-guide/ras.html#edac-error-detection-and-correction
I believe ECC inject in the BIOS is a separate feature.
Edit: According to the previous arch BKDG (http://support.amd.com/TechDocs/50742_15h_Models_60h-6Fh_BKDG.pdf) EDAC driver can support error injection using dedicated CPU registers. Donât know if logs are different when compared to injecting errors just on kernel driver level.
The host OS can communicate with IPMI (/dev/ipmi) so it should be possible to for example bypass the APML requirement in the MCA handler and communicate with IPMI directly (with âipmitool eventâ like mechanism). I do not believe something like this is being done currently (at least not in the MCA handler kernel code). So I think what you wrote about OS forwarding is simply not supposed to happen now:
12 boards available at the time of writing this
Edit: Price seems to be going up, it started at ~247 Euro
Edit2: another 12 boards at itboost.de
Edit3: Aaand Itâs gone
Edit4: The stock seems more stable now - several retailers list it and those that sold-out few days ago seem to be getting more units.