Probably the weirdest hardware issue I'd ever seen

I had the pfSense build running with a 3days+ uptime until yesterday morning it was dead in the water. All case fans + Noctua fan were spinning but it would not POST; keyboard LEDs wouldn’t light up – nothing.

Even the Asus mobo’s error LEDs didn’t show any issues. As a last resort I pulled the RAM out. RAM error LED came on.

Replaced RAM. Boom, it posted.

Has this ever happened to you?

What kind of monitoring/alerts do you have in place?

What do you mean?

Is there anything in place to notify you if you get scary messages from the kernel or SMART or anywhere else?

For instance, here is a snippet from a FreeBSD kernel log that prompted me to replace a ram module. In this case, it was a FreeNAS system which will automatically report these things if you set up email alerts.

> MCA: Bank 8, Status 0x88000040000200cf
> MCA: Global Cap 0x0000000000001c09, Status 0x0000000000000000
> MCA: Vendor "GenuineIntel", ID 0x206c2, APIC ID 53
> MCA: CPU 23 COR (1) MS channel ?? memory error
> MCA: Misc 0xa0a2d08000010245

Do you see anything suspect in dmesg?

Ah good point, I need to set that up on the pfSense box - Cheers for the reminder.

