I’m not sure there’s quite as much available bandwidth as you’re thinking, since it tends to be tied up by allocating lanes to devices. For PCIe itself (ignoring protocol overhead):
PCIe3: ~1GB/s @ x1, ~2GB/s @ x2, ~4GB/s @ x4
PCIe4: ~2GB/s @ x1, ~4GB/s @ x2, ~8GB/s @ x4
AM4 has 20 expansion lanes from the CPU, usually arranged as x4 feeding an M.2 slot, and the rest funneled into either a single x16 graphics slot or a pair of slots that will run at x8 each. Then there’s an additional x4 feed to the chipset, which may create other PCIe slots but all multiplexed through the single x4 link.
Fast NVMe is PCIe x4, and for each generation (PCIe3 and 4) there are NVMe devices that can come close to saturating it. So e.g. that ASUS 4x M.2 slot x16 card I linked earlier would pretty much put it at maximum unconstrained storage bandwidth for the entire system. Lower end NVMe might be closer to x2 in terms of bandwidth, but x1 would likely be bottlenecked on pretty much any of them — might as well use SATA instead at that point. Similarly, a single 10Gbps USB3.2 Gen 2 port requires an x2 link on PCIe3. (I haven’t seen any PCIe4 USB controllers yet.)
In this scenario I’d probably focus on optimizing for expansion of connection types instead of PCIe in general. So if you have x8 free, let’s see what could be done with a pair of x4 slots.
For USB, you could look for a card that has unconstrained individual controllers for each port. The idea is to take as much PCIe bandwidth as possible to feed some 10Gb/s USB3.2 Gen2 ports, and attach a good hub to each. Then you can connect USB devices to the hubs, and be able to handle a few without much in the way of bottlenecks.
Unfortunately I can’t find ideal hardware examples at the moment. For USB cards, this one is a PCIe3 x4 card but only makes use of x2 with a single controller, so not making the most of the available PCIe lanes, and this one is too big at x8. Sonnet’s other cards are all slower, but that 8-port has the kind of specs you’d need to look for in an x4 card: dual ASMedia controllers. There should be at least a couple options out there, just nothing I remembered to bookmark and I can’t recall any names at the moment.
For hubs, I only have this reference for USB4/TB4. USB4 by nature basically has USB3.2 Gen2 inside, so those will work just fine, but are also a bit overkill in this case. Again nothing else I remembered to bookmark, but the goal is to just find a solid USB3.2 Gen 2 hub.
The cards I mentioned can also do USB3.2 Gen2x2 20Gbps on the ports, but I don’t know if any of the hubs can. That would be an ideal combination in order to maximize aggregate bandwidth for the devices on each hub, although it would be fewer ports (and hubs) on the card before you’d hit the fully-loaded max bandwidth.
For direct storage, you could start with SAS as the base interconnect, but probably use SATA drives for cost. For example, a LSI2308-based card providing 8 SAS lanes would have a max of ~6GB/s across all of them, but in an x4 slot that would be down to ~4GB/s. They can be connected to individual drives directly, but you also have the option of using SAS expanders (same concept as USB hubs), either internally or in a separate chassis via the cards that have ports for external cabling. Combine a bunch of relatively slower SATA drives with software RAID, and you could make the most of the bandwidth in aggregate. For more on these components, I’d suggest this YouTube channel.
Of course this still requires picking up a good x8 to x4/x4 bifurcation card somewhere, and your BIOS to have a bifurcation option that covers the slot. If you don’t need the SATA connectivity, then maybe that single x8 USB card would cover enough.
Both of these approaches still require some investment, but they still allow low cost end devices, so it might work out better from that standpoint.
Not sure any of this really addresses what you’re aiming for, but maybe there’s at least one useful idea in here.