ASUS Pro WS WRX80E-SAGE - Endless Woes!

Hey guys, sorry for the long post but I need someone to talk to about this!

I’ve been running the ASUS Pro WS WRX80E-SAGE motherboard for 2 years and have had to RMA it on 3 separate incidents. This most recent incident has resulted in 3 ‘dead’ motherboards within a couple of weeks. I have been without my main workstation for almost a month. I was hoping to seek some council to see if what is happening is strictly due to user error or product design error.

Winding the clock back to when this issue started: I had bought a new NVME drive to store camera rushes for a film I was about to commence work on. I figured that I could populate the last free m.2 slot on the motherboard so I plug it in. Without any fanfare, the machine starts booting to this screen:

I could not proceed from here. No F1 to proceed to bios settings. The error regarding PCIe power was confusing as the power was plugged in and previously working fine for months.

I tried clearing CMOS, or accessing the mobo via BCM with no luck. Thinking the new drive might be a dud, I test it in some other systems and its totally fine. works great so no apparent problem there. Removed all other PCIe devices, swapped power supply, even swapped power supply cables. Nothing helped.

So I RMA the motherboard and in about a week I get my machine back working with a new board.

I set everything up again how I like it. I didn’t add the 3rd drive that killed the machine the first time back in because I was superstitious . Everything seemed stable though so after a day, I work the courage up to add it back.

Boom-- same problem. Motherboard appears to be bricked again with the same error screen.

I RMA the board again immediately. This time I supplied the computer shop with all my drives, GPUs and PCIe devices so that they can test the board in situ get it working before handing the machine back. They spent some time with it, everything appears to be working. All drives are attached and showing in BIOS.

The technicians felt like it was just a bad batch of motherboards, and possibly a problematic old bios. Im skeptical but have no reason to doubt them so I happily leave with a working machine.

On this machine I use proxmox as a hypervisor and run all my work in VMs. I decide to do a fresh install of proxmox and restore my VMs from backups as UEFI boot seemed to be acting weird.

Not all my USB devices would be seen by proxmox (but seen in BIOS) so I thought maybe there is some IOMMU setting I forget to turn on. I boot into the BIOS and change a couple of settings on the PCI subsystem settings page.

Namely:
SR-IOV - Enabled
BME DMA Mitigation - Enabled

I reboot and the motherboard is dead again again. Exact same problem.

These settings would have surely been set in the bios the previous two times the motherboard died. These bios settings were also working totally fine with my initial hardware configuration before adding the 3rd nvme drive. So it seems some odd alchemy between devices connected and these bios settings?

My question to the people: Is this is a known, dangerous setting for me to be turning on in the BIOS? I have turned it on in many motherboards before without any pause for concern and have had no problems. Maybe dumb luck??

Based on this and my previous RMA experiences with this motherboard, this product seems like a dud. The problem for me is that there are no alternative WRX80 motherboards in the country. I like the feature set of the board but just so many problems. I am feeling trapped. Is this something I can take up with ASUS? supplying me an alternate vendors motherboard?

The only other recourse I have feels like buying a new threadripper 7000 machine and avoiding asus motherboards which I’m not too happy about.
Being burned on my threadripper 3970x and now feeling burned by theadripper pro, my faith in the platform has dramatically diminished. I think I would rather instead just buy a couple of ryzen machines for the price of a single threadripper and just deal with the lack of PCIe lanes and cores.

TL;DR: I enabled BME DMA Mitigation and SR-IOV in the bios settings and it keeps killing the motherboard. Whats the deal??

1 Like

We have the same motherboard, and had a vaguely similar issue. After plugging in a PCIe card which allows connecting external PCIe devices, the computer suddenly wouldn’t boot.

It would get as far as the screen you show, with the Fatal Error message and no other useful info (other than the DRAM Q-LED was on even though nothing is wrong with the RAM).

Clearing the CMOS or removing the device would not get it to boot. It seems like whatever causes that fatal error basically bricks the BIOS. Couldn’t even enter the BIOS setup, as it still gave the same Fatal Error screen.

Fortunately we got ours back up and running by using the BIOS Flashback button on the back of the motherboard. Power off, plug in a USB stick with a copy of the BIOS and press and hold the button for 3 seconds.

After the BIOS was reflashed, with the external card removed, the computer was fully functional again.

We tried plugging the PCIe card back in, and bang same problem again. BIOS bricked with the Fatal Error screen. So back to the BIOS flashback button, and again the computer was resurrected.

In our case we’ve narrowed it down to PCIe Slot 3 on the motherboard. If we plug in our external PCIe card to any other slot, everything works fine. But the moment it is plugged into Slot 3, the BIOS bricks itself.

4 Likes

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.