Welcome to the clubs, Minnie ! And by that I mean Level1Techs and owners of Asus boards demanding to use ALL of the PCIe
I’m having a problem very similar to yours. If you don’t mind my angry ranting, you can check out my build on this thread :
YATPRO : Yet Another Threadripper Pro Build!
My problem comes from trying to use an RTX 3090 in the last slot using a PCIe 4.0 riser. The riser and the 3090 both worked fine in the same configuration when using an Asrock ROMED8-2T EPYC motherboard. So of course I am pretty disappointed at this plot twist.
Here’s what my setup looks like. The idea is to get the 3090 clear off the motherboard so that it doesn’t heat it up and doesn’t block any PCIe slot :
I got the same kind of symptoms you got, when I first started it. Sometimes it would boot, but the graphics would stutter a lot. Sometimes it wouldn’t boot. Clearly a signal integrity issue. I forced this slot to PCIe 3.0 in the BIOS and, just like you, this “solved” the problem, assuming I were willing to settle for that. Not my style, though; I like to get what I pay for.
I’m an electrical engineer by trade, I actually design boards. So I decided to see what can be done about this. Since this is looking like it’s going to be “A Project ™” I figured I’d share. Maybe you can try things on your side and we can compare how our respective machines respond.
So here’s the start of my investigation.
The first thing to know is that the Asus (by which I mean the “Pro WS WRX80E-SAGE SE WIFI”) uses PCIe redrivers on several slots as well as the U.2 connectors. I’m talking about those little things packed between the DIMM’s and the chipset heatsink :
Those are interesting chips, and I don’t mean from a technical standpoint. The packages read PI3EQX16000. You can find them on Digikey, Mouser, etc… but the one place you can’t find them is on their manufacturer’s website. For some reason that exact reference isn’t listed. But there’s a functionally identical one, PI3EQX16904.
Here’s what they look like inside :
This is for one half (TX or RX pair) of one PCIe lane. The IC contains four such channels, which means you need 8 of those chips to equip a single x16 slot (16 lines x 2 pairs = 32 pairs = 8 chips).
PCIe is tough as you probably know. The equalizer / amplifer / buffer combo isn’t a luxury, it’s a necessity. It’s part of the PCIe specification, as a matter of fact. For your viewing pleasure, here’s how equalization is supposed to work when we boot our machines :
This is only to go from 8 GT/s (PCIe 3.0) to 16 GT/s (PCIe 4.0)
If you’re into DSP, the coefficients that are mentioned in this figure are FIR filter coefficients. Exactly how those coefficients are calculated takes about a hundred page of headache-inducing PCIe spec to explain. It’s all part of a process called link training whereby both devices at the ends of a PCIe link learn to talk to each other with as few errors as possible.
And the reason I bring that up is because those redrivers, right smack in the middle of the link between our Threadripper Pro and our GPU’s, are actually dumb devices : they do not learn how to work with the PCIe devices on both sides of them. All you can do is set their coefficients for your own specific hardware and that’s it.
I’m not a betting man, but I’m willing to bet that Asus set those redrivers to average values because they thought that should work in most cases. As if Threadripper Pro users fall in the “most cases” category ! Bottom line : our crazy contraptions won’t work.
This is a long post, so I’m going to stop there, post it and leave you in suspense as to what we can try in order to fix this.