Interpreting mcelog errors

I wanted to install Ubuntu 18.04 on my laptop (Dell Inspiron 7567) to check out a System76 PPA to work around some issues with Optimus. However, the live OS locked up seconds after loading the desktop. I tried again and noticed some mce errors. I thought that was odd since Debian Stretch worked fine on that same laptop.

However when I ran dmesg under Debian I found a note that an mce error had been logged. When I look in /var/log/mcelog I see the following:

mcelog: failed to prefill DIMM database from DMI data
Hardware event. This is not a software error.
MCE 0
CPU 0 BANK 6
MISC 7880010086 ADDR fef1ffc0
TIME 1536001871 Mon Sep 3 14:11:11 2018
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c0a APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 158
Hardware event. This is not a software error.
MCE 1
CPU 0 BANK 7
MISC 47880010086 ADDR fef1ce40
TIME 1536001871 Mon Sep 3 14:11:11 2018
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c0a APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 158
Hardware event. This is not a software error.
MCE 2
CPU 0 BANK 8
MISC 7880010086 ADDR fef1ff40
TIME 1536001871 Mon Sep 3 14:11:11 2018
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c0a APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 158
Hardware event. This is not a software error.
MCE 3
CPU 0 BANK 9
MISC 3880010086 ADDR fef1ff00
TIME 1536001871 Mon Sep 3 14:11:11 2018
MCG status:
MCi status:
Error overflow
Uncorrected error
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: corrected filtering (some unreported errors in same region)
Generic CACHE Level-2 Generic Error
STATUS ee0000000040110a MCGSTATUS 0
MCGCAP c0a APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 158

I’m not sure how to interpret this. I see some L2 cache errors, and something about a corrupt context, as well as a note about the DIMMs. Windows seems happy, Debian functions otherwise, but Ubuntu doesn’t want to deal with it. How serious is this actually?

Update - The machine passes all of the Dell Windows diagnostics, including the extended CPU and memory tests. I’m stumped…

Update 2: Found a really interesting article on IBM’s website about mcelog. https://www.ibm.com/support/home/docdisplay?lndocid=migr-5084973. Apparently it can detect errors even if the system’s firmware deems them non-critical and doesn’t normally pass them on to the OS. Still, it’s weird that Fedora and Ubuntu won’t install.

1 Like