Computer Shuts Off With More than 4 GPUs

I’m trying to fix a problem with my company’s multi-GPU computer, and I’m at my wit’s end. Whenever I start the computer with 5+ GPUs hooked up (we’re trying to use 7), it shuts off before posting. If I plug in 4 GPUs, it worked just fine. I have had it running with 7 cards before, but all of a sudden it started doing this.

(I do have those two 6-pins on the side of the mobo plugged in)

Motherboard: WRX80 Sage II
CPU: Threadripper 5955X
GPUs: Zotac 4090 (7x) (all of the have Gen 3 riser cables)
RAM: G Skill RipJaw 8x 32GB
HD: Samsung EVO Plus 500GB
PSU: Corsair HX1500i Platinum 80 Plus (4x, 1 PSU / 2 cards, and another for a single card plus mobo)
UPS: 4x cyberpower 950W.

Here is what I’ve tried so far with no luck:

  • I have replaced the motherboard, RAM, CPU, and all the PSUs.

  • I unplugged each GPU (leaving the other 6 in) and powered it on to test all the PCIe slots, GPUs, and riser cables.

  • I tried turning the 4 switches on the motherboard on and off in various configurations. Not sure what they do, but it seemed worth a shot.

  • I updated the BIOS to the newest version

  • I tried plugging the PSUs directly into the wall to bypass the battery back ups

  • I tried running the computer from the batteries on the UPS (unplugging the the UPSs from the wall) to rule out the electrical system.

At one point I did manage to get 5 GPUs running, but I wasn’t getting the full power out of the 5th card. 4 cards had a score of ~5600. 5 cards had a score of 6300, which means that 5 card was only delivering about 50% of the power it should have. Now it’s back to only running with 4 or less cards

I’m open to any and all ideas.

Above 4G encoding enabled in BIOS/UEFI?

Those on separate breakers? 1500Watts pushes the limits of a single 15 amp breaker.

Those all just splitting the green wire to motherboard? There’s a certain amount of current involved with that, and your asking the motherboard to do 5x the current of one PSU. Likely that circuit (on motherboard) has a current limiting resistor to avoid blowing out that circuit if overloaded; you might be over loading that.

You could get an automotive relay to turn on the 4 that aren’t connected to motherboard; trigger it with 12 volts from the PSU that the motherboard turns on.

My question is why do you need so many cards?

TLDR: Disregard most below regarding psu overload, I didnt see the 4x psu there in OP. Your setup is hardcore and so are you my man ! I will keep the post unedited for shits and giigle, I really should read posts slowly, less misunderstanding that way.

No, better question do you undervolt and powerlimit those cards? Each 4090 in stock non-overclocked configuration can pull up to 450W alone each! Your psu is comically undersized give the potential power load.

Since you bought good platinum PSU, you should be able to run up to two 4090 safely, three is big maybe depending on workload, and four is big gamble, since platinum psus can handle transient overload well.

Running five is folly, since you can get power peaks up to 2,5 KW ! You can easily check how much cards are pulling from software readouts and sw like hwinfo.

This 4090 FE direct measurement vs workload (source):

If you really want use so many cards, you will either have upsize your psu to handle combined peak power OR play around powerlimits.

Personally , I would try powerlimiting first, since 4090 is amazingly efficient. You should be able to get 90% stock performance on 60% PL, or almost full at 75%. Depends on workload though. Second reference from debaurer (gaming only sadly)

If I remeber corectly, powerlimiting is done via vendo gpu control utility, sou you have to succefully bot the worstation to apply changes first.

  • boot with 1 gpu first
  • do stress testing if you want, log data if able
  • apply target PL
  • do stress testing again to compare efficacy
  • reboot and add second GPU
  • repeat for each additional gpu

Reference measurements (game benchmark only sadly) here:

If you go upping the PSU route, do the following (assuming windows):

  • run your target workload with single card , then two cards and write down peak values from cpu package power readout and gpu power readout in HWinfo or equivalent
  • verify values with two card setup
  • approximate for target amount of cards plus some padding, ideally + 20%.

If you went the safe and sound calculation with no assumed powelimits and OC, then for your setup I would go:

  • CPU+board 400W
  • mem+peripherals 50W
  • per GPU 480W
    =
  • 5GPU build — 2800W-2900W
  • 7GPU build — 3800W-3900W
2 Likes

Which test/benchmark was this? Are you sure it scales linearly?

Were all cards running at expected pcie speeds (gen3)? How much power were they pulling in the benchmark (each)?

Silly question, but since the GPUs can pull up to 75W (525W total) each from the slot, are both PCIe 6pin power connectors on the board plugged in? The PSU serving 1 card + mobo may be getting a lot of load (525W via PCIe on the board + CPU + 1 full card)

Have you set PCIe speeds to gen 3 manually to make sure there are no signaling issues due to the risers?

You could try to spread the load a bit more over the PSUs, i.e: One providing board & CPU only + 1 card, one providing the PCIe power to the board + 1 card, other 2 PSUs with 2 cards each. You leave 1 card out then, but worth a try?

Have you tested each of the risers separately?

He has 4 1500W PSUs, 3 serving 2 cards each, one for board + 7th card!

Oh sorry, didnt see that. Well, this build got even more insane.

Hmm, I still think powerlimitng might be good way to test if this workload related.

Otherwise, maybe check the PSUs individually? Maybe one them has developed fault in the meantime.

The PSU serving 1 GPU + mobo may still be an issue though, since each slot may pull up to 75W from that single one, so 525W + CPU/Mobo + 1 GPU direct

Most of the load will fall on 12V rail, yes? It should handle that with room to spare, even on 120V mains voltage.

However that perf anomaly with 5th gpu on 5 GPU setup might be worth investigating ( if it can be booted still).

Test scenarios:

  • How much power does underperforming card pull, what clocks does it reach under load?
  • Which PSU powers it ( from thereon marked as suspect PSU) and is it shared with other card? Is the other card performing OK?
  • If the underperforming card is connected to different psu, does it still underperform?
  • If OK card is connected to suspect PSU, does it underperform? If yes how does it behave power wise and clockwise?

Quick additional question, how do you manage those psus? As in how to stagger or synchoronize power on? Some custom doodads or manual modification?

One thing I think I think everyone is missing (unless I’m wrong on this point’s significance which is certainly possible) is that the computer is shutting off during the boot sequence. It doesn’t even POST, so it’s using relatively low amounts of power at that point. On to the replies:

Above 4G encoding enabled in BIOS/UEFI?

Yep. CSM disabled as well.

Those on separate breakers? 1500Watts pushes the limits of a single 15 amp breaker.

The 4 PSUs are spread across two 20A breakers. Should be more than enough to allow the computer to boot up.

Which test/benchmark was this? Are you sure it scales linearly?

I’m running Octane, and yep it scales linearly. When I briefly had 7 cards going I was getting a score of nearly 10k.

are both PCIe 6pin power connectors on the board plugged in?

Yep

Have you set PCIe speeds to gen 3 manually to make sure there are no signaling issues due to the risers?

Yep, did that in the BIOS

You leave 1 card out then, but worth a try?

I tried running removing each card from the mobo. In other words I unplugged each card leaving the other 6 in. Did that for all 7 different cards.

The PSU serving 1 GPU + mobo may still be an issue though, since each slot may pull up to 75W from that single one, so 525W + CPU/Mobo + 1 GPU direct

It’s shutting off during the boot up sequence, so the system isn’t pulling much wattage at that point. The UPS doesn’t make any noise, and if I remember correctly the UPS says it’s only drawing ~350W at that point.

Also it’s worth mentioning I daisy chain these to get the PSUs working

Dual PSU 24-Pin ATX Motherboard Cable 20+4 Pin Dual PSU Cable for Mining Adapter ATX Motherboard Adapter Extension Cable 30cm.

Forum doesn’t let me include links.

Yeah, that suspiciously low wattage.

General thinking is shutdown during POST is power related and initiated by PSU protective circuitry.

Most hardware also runs at full power during earliest boot stages, since power management is not yet initialized.

Default behavior might have changed since though. Nobody but nvidia knows what their firmware does.

Edit: So you have handled psu synchronization via power adaptors. Don’t know how that handles potential frequency differences between psus, but it might not important.
Psus that explicitly allow daisy chaining by design are directly connected via sync cable. They are also only consumer/speciality psu I have ever seen supporting this.

Like this:

So you use 3 of those to go 1 → 2 → 4 ATX connectors?

Are you using this too when booting with 4 GPUs? In this case, you have 1 GPU per PSU?

If yes, I guess we could eliminate PSU sync issues…

Did you also test each GPU and riser separately to confirm it’s working?

Key here is testing upwards from no/single gpu and single PSU upwards, and also including all possible combinations. You setup is very complex and very off the beaten path.

Just the multiPSU setup is pretty much outside anything I have ever seen in my career.

So have you tried:

  • trying to boot without GPU just to test whether platform itself is healthy
  • memtest just in case
  • booting with single GPU on single PSU
  • etc …

What if its just one failing psu? Maybe do visual inspection of cabling and board itself.

1 Like

Are you using this too when booting with 4 GPUs? In this case, you have 1 GPU per PSU?

I am using them, but it’s still 2 GPUs per power supply.

Did you also test each GPU and riser separately to confirm it’s working?

As in only have one plugged in at a time? No I didn’t do that. It’s worth a shot I suppose. I’ll have to give that a shot tomorrow.

trying to boot without GPU just to test whether platform itself is healthy

I mean it boots with 4 or less GPUs without issue so I’m sure the platform is healthy. Also keep in mind, I have so far tried swapping out the mobo, CPU, RAM, and PSUs, and it behaves exactly the same way. Shuts down at the same exact moment. When I watch the lights on the motherboard, it normally goes orange, red, orange, red, white, green. With 5+ cards it goes orange, red, then 2 seconds later shut off.

No error codes on diag display (if there is one) ?

Might be worth emailing Asus. UEFI is basically an OS, enough complexity to fuck things up.

No error codes on diag display (if there is one)

Well it cycles through the numbers so fast then immediately shuts down, so it’s impossible to tell if it was an error code, or just part of the cycle.

Might be worth emailing Asus. UEFI is basically an OS, enough complexity to fuck things up.

Yeah, perhaps I’ll do that. I’m going to replace the riser cables with Gen 4 ones to see if that helps. I’m getting desperate haha.