Solved: WRX90E-SAGE SE trouble POSTing with more than 4 GPUs

Question: Has anyone had success getting ASUS WRX90E-SAGE SE to pass POST with more than 4 GPUs, particularly nvidia 4090?

Solution: Same as “[WRX90e won’t boot with 6 GPUs]” (WRX90e won’t boot with 6 GPUs). It recommends Discrete USB4 Support = Disabled and this worked for me on BIOS v 0404 (July 2024).

Background: I built something like this earlier this year with ASRock TRX90 and had no problem, but it only had 3 GPUs. This DIY deep learning workstation build has 2 PSUs (2800W total using different dedicated mains in the USA) and use the method described in the ASUS manual which uses the three cables provided by ASUS (Y-cable, and two 8-pin) to connect the PSUs to the mobo. We undervolt the 4090s to 375W to stay under the power budget. 4 GPUs are installed without any problems (passes POST, BIOS 0404 is happy, boots fine to Ubuntu Server 24.04, and nvidia-smi shows the expected 4 GPUs). Two GPUs are in slots 1 and 3. Two GPUs are on PCIe extenders (set BIOS to PCI Gen 3) in slots 5 and 6. These GPUs work as expected, visible in OS, etc. The 4 GPUs are spread across the 2 PSUs. The PSU has enough power available to supply a 5th GPU. All GPUs have been proven to operate properly on another PC, using Windows (Aesthetic Tip: while testing on Windows, used Mystic Light to turn off LEDs and the GPUs save that state when moved to Ubuntu workstation). I use HDMI to the monitor (not tried DP).

Symptom: Adding a fifth GPU in slot 7 or a riser or an extender or any other slot combination on the ASUS WRX90E-SAGE SE results in never passing POST.

Q-Code cycles through codes without stopping. The Q-LEDs light up the various options (CPU, DRAM, VGA, BOOT) without stopping.

The impact is that neither the Q-Code nor the Q-LED provide diagnostic information. The system never passes POST (even after a few hours) and therefore does not boot. BIOS is also not available in this state.

I have tried moving the symptom including, different slots, different known-good GPUs, including an nvidia 1050 to remove the power question. Any combination of 4 GPUs, risers, slots, setting any PCI Gen in BIOS, etc. can work fine. The symptom occurs when adding the 5th GPU regardless of where or how.

About me: I read manuals before buying parts and write a work plan before building. I am not a pro, but I have experience building DIY systems: many gaming PCs with Intel and AMD. A few Deep Learning Workstations with current Threadripper CPUs using multiple nvidia 4090 GPUs.

About the hardware:
ht tps: // www. asus. com/motherboards-components/motherboards/workstation/pro-ws-wrx90e-sage-se/techspec/

  • ASUS Pro WS WRX90E-SAGE SE EEB Workstation motherboard

    • FYI: This BIOS has overclock enabled by default and temperature capped at ~96C. This was confusing at first because, under heavy load, more cooling created higher clocks instead of lower temps. BIOS has controls for this.
  • Threadripper PRO 7975WX (32c/64t)

  • Corsair CPU liquid cooler

    • Plan to add 360mm server grade AIO CPU cooler (not on TR 7000 QVL, but specs are good): Amazon.com
  • be quiet! Dark Power PRO 13 1600W ATX Titanium

    • FYI: to obtain “single rail” behavior for this PSU, enable “overclock” with bridge “key” or physical switch
    • plus extra: be quiet! BC072 12VHPWR Adapter Cable for Dark Power Pro
  • SilverStone Technology Extreme 1200R Platinum Cybenetics Platinum 1200W SFX-L

    • FYI: This is a single rail PSU and single rail is required
    • FYI: Use bridged cables that came with mobo to connect to 2nd PSU headers
    • plus extra: SilverStone Technology PP14-EPS Dual EPS 8 pin (PSU) to 12+4 pin (GPU) 12VHPWR PCIe Gen5
  • NEMIX RAM 256GB (4X64GB) DDR5 5600MHZ PC5-44800

  • 2 x 4TB Gen5 Team Group T-FORCE GE PRO M.2 2280 4TB PCIe Gen 5.0x4 with NVMe 2.0

  • 2 x 4TB Gen5 Crucial T705 NVMe M.2 SSD

  • 4 x MSI RTX 4090 SUPRIM LIQUID X 24G

  • 1 x air cooled 4090 GPU

Note: the PSUs are different due to a space limitation in the case, but both are high end, single rail, devices that I have worked with before. This works fine.

Manuals:
ht tps :// dlcdnets. asus. com/pub/ASUS/mb/SocketsTR5/Pro_WS_WRX90E-SAGE_SE/E23789_Pro_WS_WRX90E-SAGE_SE_EM_V2_WEB.pdf?model=Pro%20WS%20WRX90E-SAGE%20SE
ht tps :// dlcdnets. asus. com/pub/ASUS/mb/SocketsTR5/Pro_WS_WRX90E-SAGE_SE/E22761_AMD_TR5_Series_BIOS_manual_EM_WEB.pdf?model=Pro%20WS%20WRX90E-SAGE%20SE

BIOS is version 0404 – 2024/01/03 (nothing newer is available on ASUS website that this time, July 2024). I note the BIOS manual does not exactly match the BIOS as seen on-screen, but this does not appear to be significant.

Edits made to bring post up to date with operational experience in case it helps someone else.

Would you mind sharing some pics about your build? I’m considering using more than 3 4090 card with WRX90E-SAGE but cannot figure out how to squeeze them in the case. Thank you!

Here is a picture of the workstation build in a fully working state (I still have some clean up to do).

Disclaimer: My goal was something I could get done in a weekend, but the USB4 issue prevented that until I found the answer in this forum. I relied on All-In-One coolers to speed up assembly, but, if I were to do this again (and had more time), then would use single slot custom GPU blocks and a custom cooling loop with very thick radiator exhausting up. There were a lot of little annoyances in this build (e.g. HD-Audio and bottom USB3 header collide w GPU in slot 7), so it is hard to recommend. My use case is a deep learning workstation, so I don’t need audio, extra USB, etc.

The case is a Corsair Obsidian 1000D which my brother in law gave me (it seemed big enough at the time, but not sure I’d use it again - maybe 9000D or Phanteks). My original plan was to mount 4x 2-slot MSI AIO GPUs on the mobo and a 5th air cooled GPU. Unfortunately, this simplicity was not feasible because the “2-slot” GPUs are actually slightly thicker, so the 4th one is a challenge and I didn’t want to break the PCIe slot/card.

Modifications:

  • GPU: MSI AIO GPUs: removed the decorative metal plate to gain space between cards
  • Case: removed the HDD caddy to accommodate the air cooled GPU in slot 7
  • Case: bought second radiator trays for 8x120mm
  • Tray: drill 8 screw holes in the 8-fan tray to mount the radiators sideways

GPU Method: mount a double stack of GPUs to the case:

  • 4” metal bracket from Lowes as a “shelf” to support 2x GPU (AIO)
  • plus 1.5” neoprene pad (150C melting point, tested first) as a spacer needed to clear the mobo power cables
  • Velcro straps to hold double stacked MSI AIO 4090 GPUs in place
  • run PCIe extenders from slot 5 and 6 to connect the 2 vertically hung GPUs
  • air cooled GPU in slot 7 hangs off the bottom board so it’s thickness is irrelevant

Temperature Results of testing under heavy load

  • GPUs ~60C with all 5 at 99% utilization
  • CPU ~95C with the CPU hot, but performs well enough to serve all 5 GPUs (i.e. CPU is not the bottleneck even though it is just at its thermal throttle)
    • Threadripper 7000 series can handle this, but plan to change CPU cooling strategy to front mounted 360mm radiator intake to replace 3 of the 8 front fans for intake. This will add heat inside the case, so will measure, test, tweak as needed. Might add more intake (with air filter) from back w 2x120mm fans.

Remaining work includes:

  • improve GPU anti-sag from Legos to something permanent
  • cable management
  • odds-and-ends

If you are interested in this, then I can share more info about extenders.

Edits to match testing and operational results

2 Likes

@ww26 See above

Thank you jackc for the details. Even with your detailed explanation, I spent quite some time to absorb. Great effort and thoughts in here!

The 1000D case is amazing that it allows four 240 radiators in your fashion, which I have not thought about before. Regarding the MSI AIO GPU, if you also do the removal of the decorative metal plate, would you be able to put them on the mobo as well on slot 5 and slot 7? Of course this way you cannot put another 5th 4090 there. The only thing worth considering could be the cooling loop length of the MSI 4090 itself, whether the 3rd and 4th ones’ radiator can reach the top rack for mounting.

Does the 1000D come with only one 8x120 mount tray so you have to buy a second one for top mounting radiators?

One thing worth considering, as I can think of, is to switch one 4090 radiator’s position with the CPU cooler’s position. This way you can have two full sets of fans on the CPU raditor for push and pull. Or, use a 480 or 360 radiator for CPU (I’m not seeing a readily available AIO in 480 for sTR5 socket yet) on the top. You would then have two 4090 radiator on top, one on the rear and one on the front panel for taking in air. This would be less hot as compared to using the CPU radiator on the front.

Best @Jackc

@ww26 Thanks for your thoughts on this!

  1. MSI AIO fit: removing the front plates will comfortably allow 3x GPUs adjacent (slots 1, 3, 5). I tried this, however, they seem to lean a bit and by the time the 4th GPU tries to get into slot 7, it is too tight for my comfort zone. I suspect someone could make it work, but I did not pursue it because it would not help a 5th GPU.
  2. MSI AIO tube length: even in slot 1, the tube is too short to reach the “front / top” position for the radiator.
  3. 1000D tray: I had to purchase a second 8x120mm fan tray from Corsair’s website. It was about $25 delivered in US. Obsidian 1000D 8x 120mm Fan Tray
  4. 1000D rear exhaust: I tested the rear exhaust (2x120mm) for the AIO radiator and will work. In my circumstance, it did not enable anything for me. The main issue I had was that the Corsair video claims the rear case fan mount can hold 2x140mm and that is how I planned to cool the Threadripper – slightly underpowered, but still about 90% of 3x120mm radiators. Unfortunately, in my case, the 2x140mm did not fit in the space due to size of mobo heatsinks, top radiators, etc. I suspect that it is not really designed with that in mind and that it would take a lot of modifications to put 2x140mm fan/radiator in the rear.
  5. Push/Pull Radiator on top: good idea. Tests show this helps only 1C or 2C, but I am looking for anything I can get!
  6. 480 Radiator on top: Yes, that device would fit on top, if it existed or if you make one.
  • Unfortunately, the 2x120mm AIO cannot fit two in the same line the long way because the little tanks on both edges of the radiator take up too much space. So, the hack was to make four radiators fit side-by-side (required drilling 8 screw holes in the new mounting tray)

As I mentioned, this works fine, but if I were going to do this again, I would probably use a custom cooling loop and single slot GPUs.

1 Like

@Jackc your information was very helpful. I plan to build a similar machine in the 1000D case with the same MSI 4090s in about a month. Finding such a case that accommodate more than 2 native 4090 is very hard. thermaltake core w200 could be an option but too big and not sure if the last slot can be used or not.

In the meantime, here is some fresh discussion on the AIO for 7985WX on reddit in case you find it useful:https://www.reddit.com/r/threadripper/s/9xOmZdERiL

1 Like

@ww26 Thanks for the CPU cooler link! The w200 is a monster!

Hypothesis: under the GPU-heavy machine-learning workloads, the CPU can service the five GPUs at 99% utilization, but consume less power/generate less heat.

Test:

  • “Disabled” the mobo’s default CPU Overclock (ASUS PBO Precision Boost Overclock) and
  • set the thermal ceiling to 90C

I’ll report back when the results start coming in.

FYI: For comparison, here is the thermal performance prior to making this change (GPUs at 99% usage). CPU is top (blue line) at 96C. The hottest GPU (green line) is the air cooled and at 70C it is still plenty cool.
Goal: pull down the CPU temp while keeping the GPUs at 99%.

Results:

  • Turning off OC and lowering thermal limit to 90C limited the CPU too much and it could not keep the 5 GPUs fully busy

BIOS Settings:

  • Change: Removing thermal limit on CPU – back to “Auto” (i.e. no thermal cap)
  • Same: “Disable” Precision Boost Overclock (i.e. no OC)

10-Hour Run Report:

  • CPU temperature dropped from 96C to 91C (top/blue line)
  • GPU peak temperature dropped from 70C to 65C (lower/other 5 colors between 55 and 65C)
  • still 99% utilization of 5 GPUs and less energy used

Early Conclusion: Turning off OC is good enough to run as-is until the larger CPU cooler arrives.

Update with larger cooler:

  • CPU cooler: Silverstone XE360-TR5 cooler for AMD socket sTR5/SP6
  • installed this CPU cooler as front intake, dropped CPU to 70C (from 91C)
  • also added rear fans + filters for more intake (might flip later), goal is slight over pressure in case to keep out dust
  • GPUs at 99% utilization and some are ~5C warmer (max is 70C)
  • see image with thermals before and after
  • happy and calling this build “Done” and getting back to work
    :white_check_mark:

3 Likes