ASUS WRX80E SAGE SE WIFI II PCIe error on multi-GPU Ubuntu boot up

I am working on a home server that is currently running Windows 11 with the following setup:

  • Asus WRX80E Sage SE WIFI II mobo
  • AMD Threadripper PRO 5995WX
  • 512 GB ECC RAM
  • 7x4090 + 1 HBA card: PCIE slot 4 is bifurcated into x8/x8 using a PCIE Gen 4 splitter. PCIE slot 6 is used for the HBA card with Gen 3 BIOS setting. All other lanes are set to Gen 4 from BIOS.
  • Samsung 980 Pro 2 TB NVMe on M2_1 slot.
  • All SATA ports filled
  • U2_1 and U2_2 slot also filled for extra storage
    The above is a working setup under Windows 11.

My BIOS settings are on default except PCIE settings which I describe above. Resizeable bar and Above 4G encoding is enabled.

Windows is installed on the NVMe drive and I installed Ubuntu on an 8TB drive on the SATA port.

Objective
Get Ubuntu running with the same setup as Windows

Setup

  1. Installed Ubuntu using minimal setup. I think I left a couple GPU’s seated in the PCIE slot and removed everything else.
  2. After installation, I purged the default Nvidia drivers and installed CUDA drivers along with the display driver that came with it. Verified it with nvidia-smi and nvcc --version.
  3. Removed quiet splash from Grub loader for verbose output.
  4. Shutdown, plugged everything back into the PCIE slot and booted into Ubuntu.

Error screenshots
The following is what I see once I load Ubuntu.

Steps taken

  • Tried various kernel parameters: pci=nomsi, pci=biosirq, pci=noaer, amd_iommu=off, and various others
  • What is interesting is that with pci=realloc or pci=assign-busses, Ubuntu boots up GPU but the bifurcated GPUS do not show up on nvidia-smi.
  • It has to do something with the 3rd PCIE slot. If I unplug the GPU from it, everything is fine, no errors during boot. **The way I determined this is by starting with one GPU on slot 5 and added one card at a time until it failed. **
  • Tried plugging the HBA board to slot 3 and the GPU to slot 6 (BIOS settings adjusted) but same error

Current status
I can boot into Ubuntu with 6 GPU’s and the HBA card. Slot 3 is empty. Expectation is to be able to boot into either OS regardless of the PCIE configuration since it is proven to work on Windows. Any help would be greatly appreciated.

1 Like

(post deleted by author)

1 Like

@Shadowbane Thank you for your response. I took some time to figure stuff out before hastily leaving a reply and also to see if I can fix this issue. First of all, when you say you passed my query to your “hardware bot”, do you mean like an LLM trained on hardware stuff?

Coming to the problem at hand, I went through another thread here on L1T forums and full credits to @Nefastor for explaining how the PCIE slots are configured on this mobo. If anyone is facing any issues similar to mine, I’d suggest going through the linked thread, I can’t explain it better than him. Basically, any weird configuration you want should go on the first three slots since they don’t have any redrivers and being closer to the CPU means the signal would not need any amplification. So I moved my two splitter cards to slot 2 and 3 and had 4 GPU’s coming out of it. Mind you I had to change them to Gen 3 speeds from BIOS or else the endless AER errors. I kept the bottom 4 slots for 3 GPU’s running at Gen 4 and one HBA card at Gen 3. I didn’t want to mess with the redriver settings in BIOS so I kept it simple but if anyone wanted to further split those lanes, then they would need to go start tweaking the BIOS settings for these redrivers in order to get it to work.

I did not need any special kernel parameters for Linux, as soon as I changed the ordering of the PCIE slots and adjusted the BIOS settings, Linux booted up fine so @Shadowbane 's hint about the PCIE configuration was correct. Windows must be pretty flexible if it was able to run the hardware without issues but Linux didn’t like it. As of now, this setup allows me to run 4x4090’s at Gen 4 and 4 at Gen 3 speeds + 24 SATA (on-board + U2 + HBA) drives and 1 NVMe drive.

2 Likes