Ubuntu Asrock WRX80 Multi-GPU Rig Boot Issues

New account, first post - I need some help, please. I’ve got 5 4070 ti GPUs that I want to put to use mostly for science applications. I learned the hard way that you need a high end cpu and mobo for this as my first build attempt with all of the GPUs was with a threadripper 1920x and an Asus Prime X399-A. Not enough PCIe lanes I discovered among other things.

Now I’ve got the following:
CPiU: used Threadripper Pro 5995WX
Mobo: Asrock WRX80 Creator (5 PCIex16 slots + 2x8)
RAM: 256GB OWC 3200MHz DDR4 RDIMM (4 x 64 GB)
PSU: 2 x 1200 Watt Thermaltake
AIO: Silverstone Icegem 360
SSD: Kingston NV2 1TB M.2

Everything is mounted on a mining rig frame.
The PSUs are connected with a Silverstone dual 24 pin adapter.
1 PSU is powering the 2 12V CPU connectors, the ATX, and 2 GPUs. The other powers 3 GPUs and the additional 6pin connector on the mobo.

I initially installed ubuntu 22. The SSD hadn’t been wiped from the previous build attempt which I think led to my initial difficulties. I completely wiped it and installed Ubuntu 24 and was able to boot with 1 GPU.

I installed the latest Nvidia drivers (560) and cuda toolkit which are apparently very unstable with my components. I downgraded to 535 and everything seemed smooth but nvidia-smi showed “Err!” under the fan column for any visible GPUs. Ultimately I settled on the 550 driver which got rid of the Err message.

Multiple GPUs led to problems. After messing around with drivers, deleting some xconfig file, and trying each GPU one at a time and each of the PCIe16 slots individually, I starting adding one GPU at a time and somehow made it to booting up the system with all 5 GPUs being recognized under nvidia-smi (This is after almost 2 days of struggle). I’ll add that other times when I’ve booted up with 2 or more GPUs, one or more does not appear in the nvidia-smi output but those invisible ones do appear under lspci output.

After trying to optimize the distribution of the power cables, nothing worked again and I returned to the cable configuration that worked that one time, and nothing. Now I can’t get anything to work with more than 1 GPU. I also can not boot after entering BIOS. I have to clear CMOS any time I change any BIOS setting in order to POST again.

I can’t find any obvious pattern why this isn’t working. My thinking now is that it’s the PCIe 3.0 riser cables. 3 of them are 20cm and 2 are 30cm. I’ve tried setting the PCIe slots to “Gen 3” in BIOS but then I can’t POST and get a “71” code on the mobo. The rest of the times I can’t POST it reads “42” or “94” (if I remember correctly). At this point I’m thinking I need to try gen 4 riser cables.

If anybody has any solutions, you will lead a fulfilling life. I did see this interesting thread (Help with WRX80E-Sage SE Render server - #12 by Nefastor) about PCIe redriver settings, but would rather not have to experiment with that.

The PCIe riser cables are unstable. I went through similar issues on the Tyan S8030 platform, and in the end chose to use SlimSAS 8i to PCIe 4.0 riser cards.

Why using pci expres 3 risers on pci expres 4 cards, Don’t you want the bandwitch

Check the mainboard manual regarding this.

Thanks, I’m not familiar with those. Do those connect to SATA ports or something? Looks like if you want to hook into the pcie slot you’d need converters on both ends?

I wanted to save a few bucks. Never ran the numbers but I don’t think the data transfer speed is going to have a major effect on performance for what I want to do.

I didn’t see anything specific about splitting the power supply. Relevant line is probably, “* Install the PSU’s power cable to this connector (6 pin) when 4 graphics cards are installed”

I’m assuming I can still make it work with 5 graphics cards.

You need 3 components to make it work
a) PCIe4.0x16 to 2x SlimSAS 8i ports (insert this PCB card to motherboard’s PCIe slot)
b) 2x SilmSAS 8i ports to PCIe4.0x16 (this PCB card will be attached to GPU)
c) two SlimSAS 8i cables to connect a) and b).

These components are available on eBay and Amazon.

In addition, if your motherboard supports bifurcation (x8x8), you can purchase an additional component b, and support 2 GPUs from 1 PCIe slot: “component a” has two SlimSAS 8i ports, and you connect a SlimSAS 8i cable to one “component b”, and the other cable to the other “component b”.

The tradeoff is that each GPU only gets x8 bandwidth.

If you’d like to stick to the PCIe 3.0 riser cable, you may want to change the PCIe slot’s bandwidth from Auto to GEN3.0 in BIOS.

Thanks @andy12. It seems like it’s quite a bit cheaper to just get gen 4 cables. I tried the gen 3 bios setting but it won’t post. I can’t seem to make any changes in BIOS that will result in successfully booting. Fingers crossed that new cables will work.

I don’t want to discourage you, as your use case might be different from mine.

I’d like to share my experience with PCIe 4.0 riser cables. I tried at least 10+ of them (bought from Amazon), and it’s very difficult to get a stable 4.0 connection to GPU. Most of time it does NOT post, and for some cables, lspci works but dmesg shows “GPU has fallen off the bus”; For a couple of cables, nvidia-smi works, but not stable.

Downgrading to PCIe GEN 3.0 in BIOS gives me much higher probability of success with these PCIe riser cables (meaning nvidia-smi works and the GPU functions), but I prefer the GEN4.0 bandwidth, that’s why I chose to go with the SlimSAS approach (it’s definitely more expensive, though).

1 Like

My initial impulse was to just follow your suggestion but I’m still trying to limit the financial pain. At this point, nickel and diming on cables seems dumb I’ll admit. I’ll try the SlimSAS route next if the new cables don’t work. Do you recommend any specific vendors or brands? For what it’s worth, I ordered name brands for the PCIe 4.0 risers instead of the companies with names produced by a cat playing with scrabble tiles.

Totally understand. It’s unbelievable that these riser cables/cards are so expensive.

I believe there is only 1 vendor (JMT) selling these SlimSAS cards on Amazon. I do see a few choices on eBay but i didn’t try them (I think they should work as long as the SlimSAS ports/cable are of 85 ohm)

I believe the further you go down the board the more trouble you’ll find with signal integrity

What’s the trick for everybody with a 4+ GPU rig?

Name brand PCIe4.0 cables solved the problem. I mixed and matched with MSI, Phanteks, and Cooler Master. Everything is running smooth. Thanks @andy12 for making it clear that the cables were the issue.

1 Like