I am working on a home server that is currently running Windows 11 with the following setup:
- Asus WRX80E Sage SE WIFI II mobo
- AMD Threadripper PRO 5995WX
- 512 GB ECC RAM
- 7x4090 + 1 HBA card: PCIE slot 4 is bifurcated into x8/x8 using a PCIE Gen 4 splitter. PCIE slot 6 is used for the HBA card with Gen 3 BIOS setting. All other lanes are set to Gen 4 from BIOS.
- Samsung 980 Pro 2 TB NVMe on M2_1 slot.
- All SATA ports filled
- U2_1 and U2_2 slot also filled for extra storage
The above is a working setup under Windows 11.
My BIOS settings are on default except PCIE settings which I describe above. Resizeable bar and Above 4G encoding is enabled.
Windows is installed on the NVMe drive and I installed Ubuntu on an 8TB drive on the SATA port.
Objective
Get Ubuntu running with the same setup as Windows
Setup
- Installed Ubuntu using minimal setup. I think I left a couple GPU’s seated in the PCIE slot and removed everything else.
- After installation, I purged the default Nvidia drivers and installed CUDA drivers along with the display driver that came with it. Verified it with
nvidia-smi
andnvcc --version
. - Removed quiet splash from Grub loader for verbose output.
- Shutdown, plugged everything back into the PCIE slot and booted into Ubuntu.
Error screenshots
The following is what I see once I load Ubuntu.
- There is an incessant feed from AER system
- PCIE error, legacy PCI end point error from vendor_id 0x10de (Nvidia GPU)
- APEI Generic Hardware Error Source 512 from vendor_id 0x1022 and device_id 0x1483 (AMD Starship Matisse GPP Bridge).
- Frequent instances of BadDLLP errors from AER.
Steps taken
- Tried various kernel parameters:
pci=nomsi
,pci=biosirq
,pci=noaer
,amd_iommu=off
, and various others - What is interesting is that with
pci=realloc
orpci=assign-busses
, Ubuntu boots up GPU but the bifurcated GPUS do not show up on nvidia-smi. - It has to do something with the 3rd PCIE slot. If I unplug the GPU from it, everything is fine, no errors during boot. **The way I determined this is by starting with one GPU on slot 5 and added one card at a time until it failed. **
- Tried plugging the HBA board to slot 3 and the GPU to slot 6 (BIOS settings adjusted) but same error
Current status
I can boot into Ubuntu with 6 GPU’s and the HBA card. Slot 3 is empty. Expectation is to be able to boot into either OS regardless of the PCIE configuration since it is proven to work on Windows. Any help would be greatly appreciated.