I was looking for a case that could support two PSUs like that… Is it a custom job?
No, it’s the original Lian Li O11 XL, but they don’t make it anymore unfortunately. It was awesome for dual PSU builds.
Officially the server edition supports 400W to 600W. Given that it was first available before MaxQ (on ebay anyway), I am surprised no one has come forward yet
I am having some issues with the Blackwell MaxQ on a Gigabyte MZ73-LM0 and the two lower PCIe slots (#1 and #2) both of which are connected to an on board re-timer. Detection is very spotty with link speed of 5.0(Auto) in those slots. Settings link speed to 4.0 detected the cards. This happens with multiple MaxQ cards.
Are the cards slotted straight in or via risers? I have not tested it but I avoided the MZ73 and preferred MZ33 only because of the crappy built in retimers. Always better to use your own retimers, redrivers or switches. Still, plugging them in straight should still work. That’s what it was designed to do.
I plugged them in straight one card after another to test them all after the riser didn’t work. These 2 slots just don’t work with the Blackwell. Do you have any reference for the “crappy” re-drivers? And is it true that only the LM0 has them on these 2 slots but the LM2 does not have any?
No reference
Just bad experience with other brand motherboards that try to “help” by baking them into the board with no bypass. So I shouldn’t generalize and maybe the MZ73 has decent ones and there’s something else going on. Sorry about that.
I think you’re right: LM2 doesn’t have the re-timers, but I have never tested it personally. LM1 variant I think is older and only does Gen4.
I also think the MCIO ports don’t have retimers. If you have MCIO to PCIe slot adapters, you can try those to isolate the issue. You should be able to gang both x8 ports into a single PCIe Gen5 x16 port in the BIOS. Though that’s only 1 x16, not 2.
I’m also not 100% sure the ganged MCIOs will actually do x16. On the MZ33 for instance, the x8s are not truly bifurcated and go through a secret PCIe x8 switch, and do not do x16 even when combined, even though the bios/block diagram saying it should.
It’s sad that Gigabyte (or any other company for that matter) doesn’t expose all PCIe lanes on dual socket boards available in the open market. Probably an issue with ATX form factors and space. Other than something like MZ73, you need to purchase a full-blown and expensive machine with overcharged bundled-in components to get good proprietary boards from SuperMicro, etc., with good lane specs. Or do the single-socket MZ33 or maybe the LM2 variant.
Do you have an email contact at PNY?
I was looking at the MZ73 but wow, that block diagram is wild. I guess if you’re looking to do 4 x GPUs for an ML workstation, a single F chip and 12 DIMMs on an MZ33 w/ good risers is probably the way to go?
The sad state of Epyc motherboards is why I ended up with (another) TR Pro. The boards are just so bad that it’s not worth dealing with the headaches for an extra 4 RAM channels. Especially if you need a tower/workstation, Epyc seems like
. It definitely works well in racks though.
I got a reply with links to the vbios update files in less than 24 hours by filling a support ticket on their website.
The web page is [PNY website]/support
Awesome, thanks for that!
Flashed my card too and it went without issues. Though I was nervous as hell.
Has anyone had an updated vBIOS for Max-Q cards? I’ve asked them for that as well.
was looking at the MZ73 but wow, that block diagram is wild. I guess if you’re looking to do 4 x GPUs for an ML workstation, a single F chip and 12 DIMMs on an MZ33 w/ good risers is probably the way to go?
Yeah, that’s what I ended up after trying some other options. With a PCIe switch or bifurcation you can run far more than 4 GPUs off a single processor as well. Also, the MZ33 supports 2dpc, so you can use 24 DIMM sticks for more RAM (or reduce cost per stick with lower density). You only lose the extra 12 channels that you get with DP.
I do miss the monster core count processors while building/compiling huge repos though, the F series chips with lower cores can be a bit slower.
The sad state of Epyc motherboards is why I ended up with (another) TR Pro. The boards are just so bad that it’s not worth dealing with the headaches for an extra 4 RAM channels. Especially if you need a tower/workstation, Epyc seems like
. It definitely works well in racks though.
I had the opposite journey, giving up after trying TR boards. The signal integrity on the farther PCIe slots when all are used at full capacity is the same or worse than server. ASUS SAGE boards for instance are the ones where I first encountered the poor quality, unbypassable redrivers. Once they mess the signal up it is really hard to fix, and the TR boards seldom give you MCIO ports directly.
The 2 TR boards (both ASUS) I tried also have hardware bifurcation bugs I’ve documented in other threads, whereas the EPYC boards are usually better tested for those use cases. Maybe SuperMicro TR boards are better?
I agree the TR boards have more features, but for lane integrity and related features I’m not sure they’re actually better in practice.
EDIT: TR is probably good to drive 4 or even 5 GPUs though, I do have another setup of 4090s/Ada6000s with one. I don’t think I noticed the 8 vs 12 ch difference in my workloads either. I think being prepared for some bad slots and not relying on bifurcation still makes TR work.
Wasn’t the essence that the Max-Q version works if you use the Display Mode Selector if you got one that was stuck in compute mode? Or the other way around activate compute mode if yours is in display mode? Depending if you want display output or MIG features?
so I found the july 25 display mode selector is the ONLY one that works reliably. I had one from my earlier attempts and wrongly assumed thats all I needed. I had to redownload the display mode selector. Important point. (the redownload was dated july 25 fwiw even tho I redownloaded it at the beginning of august)
What version of bios did they provide? (Just to see if it’s the same one we got from our vendors too).
Which TR boards specifically (Asus) are you referring to? I’ve been hitting this latest version WRX90E Sage board pretty hard the last few weeks with multi-GPU and I haven’t seen any abnormal behavior, I also tried it with retimers in the bottom 4 slots too just to be sure. (Now that said, I wasn’t doing bifurcation, these are straight x16 slots (or x8 for slot 6) with and without retimers).
That said, I’d love to understand how/where you saw the fault and if it’s reproduceable to see if I can replicate it, but thus far it’s been working pretty well overall.
I think WRX90E was fine, other than the forced x8 on one of the slots. Both revs of WRX80E had the poor quality redrivers for the bottom slots, and re-timers wouldn’t work cascaded with them. It varied from board to board (I tried 2 or 3 IIRC, both revs). Sometimes I could fix it by messing with the EQ settings in the bios for some ports, sometimes not.
For the P2P bug: you would need to bifurcate x8/x8 (which until recently was not even possible on the WRX90E I think, and maybe still isn’t). All tests seemingly work, but when I checked the tensor values that moved between the 2 GPUs along the bifurcated ports on the same slot, occasional bit flips would creep in. I could replicate this across boards and for both the 3995wx and 5995wx chips, across AGESA versions, and for both Windows TCCL and bare metal Linux (WDDM & WSL didn’t show it, but then those do not support P2P in the first place). Moving the GPUs to a switch without bifurcation solved the issue, or swapping it out for an EPYC board also solved it.