As you point out, switching with speed shifting’s provided to consumers via chipsets. Broadcom’s switch niche seems too small for Intel to much care about (see the Stratix-Agilex scalable switch) and misaligned to ASMedia.
There’s a couple other ways to address this.
- Update 3.0 x8 devices to 4.0 x4 or 5.0 x2. Or similar.
- Provide a 4.0 x8 chipset slot from DMI x8 (Intel Z) or 5.0 x4 (Promontory).
That neither of these has happened seems to me to indicate a market deemed too small to be a priority, as does the general lack of PCIe 4.0 x1, x2, and non-NVMe x4 devices. Only ones I can think of are Acquantia’s x1 10 GB NICs.
There’s Intel’s x8/x8, the x8 switch on AMD Taichis, and past that I can only think of one motherboard offhand offering x16 + x4(16) + x4(16) slots. I’m pretty sure if Intel or AMD supported it there’d be 5.0 x16 + 4.0 x8 motherboards for dual GPU which could also be used as GPU + legacy server NIC.
I think it’s worth also noting x8 or even x4 isn’t necessarily much of a GPU handicap. Scaling tests often find negligible differences and -2% (x8) or 6-7% (x4 equivalent) on a 4090’s fairly minor. So x8/x8 to full rate a 50 or 100 Gb NIC’s not terrible.
Not sure about this. If HEDT was making good money Intel wouldn’t have exited the segment. Or shrunk it to Xeon W, depending how you define things. AMD also presumably wouldn’t be as on again, off again with X (versus WX) Threadrippers. From what I read, a lot of the peasants are unhappy with LGA1700 and AM5 PCIe motherboard prices and aren’t going to pay for LGA 4677, sTR5, or RDIMMs.
It seems to me AMD and Intel have something of a bundling problem here but either haven’t really recognized it or are overestimating workstation upsellability. For example, Storm Peak’s non-Pro IO die (quad channel, 48 5.0 + 24 4.0 lanes) doesn’t need to be tied to a package and socket capable of 12 CCDs, 8 channels, and 128 lanes.
Based on phrasing and content I’m getting a LLM vibe from this post. If that’s in the right direction, a reminder of the AI rule seems appropriate. Anyways…
I’ve not exhaustively surveyed Intel boards but it’s my impression x8/x8 and x8/x4/x4 support’s typical of upper end Z. Every recent-ish AMD board I’ve checked supports several x16 bifurcation patterns, including x4/x4/x4/x4. I suspect the OP’s more concerned with x16 GPU + x8 NIC or maybe x8/x8 than with the NVMe possibilities, though.
Widening’s often unnecessary. My experience is moving ~7 GB/s with a 4.0 x4 NVMe easily saturates 12-16 core CPUs and dual channel DDR if the data wants even fairly minimal processing or effort to produce. That’s mostly because it’s seldom practical to keep code single pass or maintain close enough locality that data flows through L3 only once. For example, a single Zen 5 core clocked at 5.2 GHz demands 1 TB/s data bandwidth to fully utilize its 2x512 load + 1x256 bit store units but has less than 250 GB/s available when operating on 64+ kB of data.
If it’s pure IO though, then ~84 GB/s copy bandwidth is possible with dual channel DDR5-5600. DMAing full rate 100 GbE onto a 5.0 x4 NVMe takes 24 Gb/s (12 GB/s from source plus 12 to the sink). 84 GB/s would be more like full rate DirectStorage between three 5.0 x4 NVMes and three dGPUs.