WRX90 bmc logging useless

I can’t seem to find any way to catch why my computer crashes. Today my computer just literally turned off. The BMC was unresponsive, I had to power cycle the PSU to get into the BMC even though the computer would boot to the OS.

I’d really like to be able to use the server management functions to debug why i’m having hardware issues, but nothing ever seems to get logged in the bmc, no matter what it is, recoverable WHEA, non-recoverable, crashes, etc.

I’ve had this problem w/ 5995WX using Asus Sage WRX80E Wifi-II board, only under high load, and haven’t been able to root cause. It’s fairly rare occurrence, but I’ve been able to eliminate it so far using two strategies below.

This board logs AER in Windows. Monitoring w/ HWInfo I can see those errors typically originate from NVMe devices mounted on Hypercard. To mitigate that I’ve set PCIe Gen3, and coupled with BIOS 1602, disabling spread spectrum, I’m able to eliminate those errors. This has improved stability, and I’ve not had an incident in months.

Another mitigation strategy was to disable hyper threading, and was the strategy I used until re-evaluating w/ BIOS 1602. While this strategy did not resolve AER’s it did improve stability similarly.

Unfortunately I’m not able to run PCIe Gen4 on the hypercard without introducing stability issues. I’m using SolidImg drive, pro variants on the card.

Theories about why BMC becomes unresponsive. Unlike Asus SPR boards, Asus doesn’t provide a wiring diagram for this board. Assuming the BMC is wired similarly, then it’d connect through the chipset (WRX80E, in my case). If true, then I assume the chipset would have locked. How to test, know what to do with this information, is something I’d be interested in learning.

1 Like