If I’m reading this right, you’re talking about dicing up 1x16, to 1x8 + 2x4s [3 peripherals]?
Part of me would want to say, playing it safer with investigating ATX boards, as the chipset will host an x4 slot (phys. x16), usually set at bottom of mainboard [leaving your CPU lane divvying from that 1x16 to 2x8s]
I’d specifically like x8x4x4 to use with a bifurcator (that frustratingly I can’t link to). It allows one GPU x8, a low profile GPU at x4, and another nvme drive at x4
NVMe Boot, is tied to the CPU already. If you’re looking at accommodating dual M.2.s, do study all the upper ITX options. Some would only support 1 M.2 [being CPU], when others may support a 2nd slot, as Chipset support [whether its SATA, NVMe or accepting both signals]
You can have something like this, as your lane summary w/ the bifurcation:
CPU (20 Lanes) = NVMe(x4) + PCIe (x8) + PCIe(x8)
Chipset (4Lanes) = NVMe(x4)