ASUS Pro WS WRX80E-SAGE - Endless Woes!

Hey guys, sorry for the long post but I need someone to talk to about this!

I’ve been running the ASUS Pro WS WRX80E-SAGE motherboard for 2 years and have had to RMA it on 3 separate incidents. This most recent incident has resulted in 3 ‘dead’ motherboards within a couple of weeks. I have been without my main workstation for almost a month. I was hoping to seek some council to see if what is happening is strictly due to user error or product design error.

Winding the clock back to when this issue started: I had bought a new NVME drive to store camera rushes for a film I was about to commence work on. I figured that I could populate the last free m.2 slot on the motherboard so I plug it in. Without any fanfare, the machine starts booting to this screen:

I could not proceed from here. No F1 to proceed to bios settings. The error regarding PCIe power was confusing as the power was plugged in and previously working fine for months.

I tried clearing CMOS, or accessing the mobo via BCM with no luck. Thinking the new drive might be a dud, I test it in some other systems and its totally fine. works great so no apparent problem there. Removed all other PCIe devices, swapped power supply, even swapped power supply cables. Nothing helped.

So I RMA the motherboard and in about a week I get my machine back working with a new board.

I set everything up again how I like it. I didn’t add the 3rd drive that killed the machine the first time back in because I was superstitious . Everything seemed stable though so after a day, I work the courage up to add it back.

Boom-- same problem. Motherboard appears to be bricked again with the same error screen.

I RMA the board again immediately. This time I supplied the computer shop with all my drives, GPUs and PCIe devices so that they can test the board in situ get it working before handing the machine back. They spent some time with it, everything appears to be working. All drives are attached and showing in BIOS.

The technicians felt like it was just a bad batch of motherboards, and possibly a problematic old bios. Im skeptical but have no reason to doubt them so I happily leave with a working machine.

On this machine I use proxmox as a hypervisor and run all my work in VMs. I decide to do a fresh install of proxmox and restore my VMs from backups as UEFI boot seemed to be acting weird.

Not all my USB devices would be seen by proxmox (but seen in BIOS) so I thought maybe there is some IOMMU setting I forget to turn on. I boot into the BIOS and change a couple of settings on the PCI subsystem settings page.

Namely:
SR-IOV - Enabled
BME DMA Mitigation - Enabled

I reboot and the motherboard is dead again again. Exact same problem.

These settings would have surely been set in the bios the previous two times the motherboard died. These bios settings were also working totally fine with my initial hardware configuration before adding the 3rd nvme drive. So it seems some odd alchemy between devices connected and these bios settings?

My question to the people: Is this is a known, dangerous setting for me to be turning on in the BIOS? I have turned it on in many motherboards before without any pause for concern and have had no problems. Maybe dumb luck??

Based on this and my previous RMA experiences with this motherboard, this product seems like a dud. The problem for me is that there are no alternative WRX80 motherboards in the country. I like the feature set of the board but just so many problems. I am feeling trapped. Is this something I can take up with ASUS? supplying me an alternate vendors motherboard?

The only other recourse I have feels like buying a new threadripper 7000 machine and avoiding asus motherboards which I’m not too happy about.
Being burned on my threadripper 3970x and now feeling burned by theadripper pro, my faith in the platform has dramatically diminished. I think I would rather instead just buy a couple of ryzen machines for the price of a single threadripper and just deal with the lack of PCIe lanes and cores.

TL;DR: I enabled BME DMA Mitigation and SR-IOV in the bios settings and it keeps killing the motherboard. Whats the deal??