Running into what looks like a Gen 5 link training issue with dual RTX PRO 6000 Blackwell cards on an ASUS Pro WS WRX90E-SAGE SE. Wondering if anyone else has hit this or has insights.
The Setup
- ASUS Pro WS WRX90E-SAGE SE
- AMD Threadripper PRO 9975WX
- 2x NVIDIA RTX PRO 6000 Blackwell Workstation Edition
- Ubuntu 24.04, NVIDIA driver 580.105.08
- BIOS 1203 (latest as of now)
The Problem
Both GPUs boot at PCIe Gen 1 (2.5GT/s) instead of Gen 5 (32GT/s). Full x16 width is there, just severely degraded speed.
LnkCap: Port #0, Speed 32GT/s, Width x16
LnkSta: Speed 2.5GT/s (downgraded), Width x16
Interestingly, EqualizationComplete+ shows on both - so the Gen 5 equalization handshake is happening, it’s just failing and falling back to Gen 1.
What I’ve Tried
BIOS settings are all correct: PCIe Gen 5 enabled, Resizable BAR, SR-IOV, IOMMU, Above 4G Decoding, 10-bit tag support - all enabled.
Manual link retraining via setpci partially works:
sudo setpci -s 10:01.1 CAP_EXP+30.b=40 # GPU 0's upstream port
sudo setpci -s f0:01.1 CAP_EXP+30.b=40 # GPU 1's upstream port
After this:
- GPU 0 (bus 11, root complex 10): Comes up to Gen 4 (16GT/s)
- GPU 1 (bus F1, root complex f0): Stays stuck at Gen 1 (2.5GT/s)
Both GPUs are on separate root complexes going direct to CPU - no switches in the path.
The Weird Part
LLM inference performance seems unaffected, even on models >96GB that shard across both cards. Makes me think this may have been present from day one.
Questions
- Anyone running dual Blackwell on WRX90 (or TRX50) that can confirm their link speeds?
- Saw Wendell mentioned weird P2P bandwidth on Sapphire Rapids with these cards. Known Blackwell firmware issue?
- Any kernel parameters or workarounds beyond the setpci retrain trick?
Planning to do a physical GPU swap test to determine if the issue follows a specific card or stays with the slot. Will report
you tried running nvtop during inferencing ? it shows the link speed there. power management may have it run at lower rates when it’s idle.
pcie gen5 dual blackwell had never not worked across all the wrx90 and trx50 boards I have.
Wendell - honoured to get a response from the zen-master himself, thank you! Never miss a YT episode!
and yes you were absolutely right. It’s aggressive power management. I was checking lspci at idle like a fool and seeing Gen 1, panicking.
Fired up nvtop as you suggested and watched it during actual load (DeepSeek 671B on Ollama). During model loading, both cards jumped to Gen 4@16x. During inference the speeds flip around constantly - Gen 4 under heavy transfer, dropping back down when idle between token batches.
Peak throughput I’m seeing is around 26 GiB/s RX on GPU 0, which is ~82% of Gen 4 theoretical max. Both cards working, both hitting Gen 4 under load, system is healthy.
One question though - are you actually seeing Gen 5 (32GT/s) on your WRX90/TRX50 setups, or is Gen 4 the practical ceiling right now? nvtop never shows higher than Gen 4 for me even under maximum load. Wondering if Gen 5 negotiation is still a firmware/driver maturity thing across the board, or if I should be investigating further.
Thanks again for the quick sanity check - saved me from unnecessary RMA conversations.
Those cards should do gen5. Maybe load optimized defaults, make sure you’re on latest bios. Theres going to be yet another asus bios update to fix some bugs I found they’re promising to fix… in a few weeks probably…
load optimized defaults and rollback any customizations you did trying to troubleshoot the issue and it should pick right up.
if you’re on ubuntu 24.04 I recommend installing the oem-c kernel which is a bit newer than the latest on 24.04 and that can help squeezing out ever last bit of perf.
2 Likes
Thanks Wendell - really appreciate you confirming Gen 5 should be achievable on this platform. Good to know it’s config/BIOS rather than a hardware ceiling.
Will load optimized defaults and roll back my troubleshooting tweaks. Currently stable at Gen 4 (~26 GiB/s) so I’ll probably wait for that incoming BIOS fix you mentioned before pushing further - no sense chasing it twice.
Will also get the oem-c kernel installed on the Ubuntu 24.04 side.
Thanks again for taking the time.
UPDATE: Followed Wendell’s advice — reset to factory BIOS defaults. Both RTX 6000 PRO Blackwells now running at PCIe Gen5 x16. DeepSeek 671B humming along. Thanks Wendell! 
2 Likes