PCI Express 4.0 x4 card with 4x M.2 NVMe slots (with a switch chip)

So I know PCIe x16 cards exist that simply split the x16 up to 4x x4 lanes and break it out for 4 M.2 slots. I have also found a PCIe 3.0 card from Sabrent that does what I want and fits 4x NVMe drives. However PCIe 3.0 is leaving a ton of performance on the table and for a card that splits the 4 lanes up between the 4 drives it would matter a lot whether it can do PCIe 4. I’m assuming not supporting 4.0 might be due to the bandwidth of the underlying multiplexing chip, because I was unable to find any cards that came with PCIe 4.0.

Does anyone know if they exist?

For context, I have an extra x4 slot on my home server build and instead of breaking that out to SATA I was wondering if I can better utilize its performance (and in a smaller footprint) by using NVMe drives. There are many all-NVMe NAS drives out there, but I want this to live in the same case as the rest of my server.

If you can fit an x16 card then this might work:

I’m guessing this would work also on just x4 lanes, but obviously check with HighPoint first!

They also have both PCIe 4.0 and PCIe 5.0 cards with 8 M.2 slots, apparently:

It’s expensive though. Compare to the cost of buying new, larger NVMe drives instead before buying.

Are you sure your workload would actually leave a ton of performance on the table by being restricted to PCI-e 3.0? Just because some 4.0 or 5.0 drive can hit its interface speed during a single specific synthetic benchmark doesn’t mean it’s working anywhere close to that under real world conditions.

The only time you’d actually feel that is when copying large files.

When dealing with Q1T1 random 4k I/O, my best NVME drives (Samsung 990 Pro, Solidigm P44) are barely 3-4x as fast as a SATA SSD from ~15 years ago, not 20x like we’d expect. And none of them are even approaching the SATA interface bandwidth restrictions either.

It highly depends what you’ll be using the drives for. Sequential Q8 benchmarks are sensational misleading garbage for day to day use. Ignore them unless that’s all you’ll be using the drive for: moving data back and forth between 2 or more NVME drives pointlessly.

2 Likes

+1. Rocket 1504’s not too much more than an SN850X 8 TB but problems with HighPoint stuff seem common. So with an x4 uplink I’d rather just have two 8 TB drives than putting like four 2 TBs behind a 1504.

Similarly, for about the same money as Rocket 1608A you can buy an X870E with two PEG M.2s and two 4.0 x4 8 TB NVMes to put in them. Or two 5.0 x4 4 TB drives.

Mmm, no. SEQ1M Q8T1 correlates well with common IOs like SEQ 128k Q1T1 for loading files. So it’s a pretty decent measure of drive performance, really.

Point’s more that few workloads read or write enough data fast enough for even PCIe 3.0 x4 to make much of a difference.

Primary use case for x16 NVMe RAIDs is fast swap for tasks too large to fit in memory. Not the OP’s situation but it’s usually a pure large sequential workload, the motion is necessary to task completion rather than pointless, and it’s not between NVMes.

Wouldn’t call robocopy /mt or similar drive to drive syncs pointless either, particularly as we often move hundreds of GB to a few TB that way. Most common use case there is incremental off hours backup, where 3.5s are probably fine for perf, but it’s not hard for time to pSLC fill and subsequent cache folding rate to end up important to task completion time.

Low latency and high rate 4k IOPS can be important but workloads that issue enough of those to matter are more niche than high rate large sequential ones. If there was a lot of IOPSy need Optane’d still be in production.

1 Like

I guess we will find out, I ordered a Viking U20040 that’s PCIe 3.0 x4. From reviews it looks like it’s ample performant, my only concern is heat at this point (I will report back once it arrives).

I got 4x Lexar NM790 4TB drives for roughly the price of an 8TB SN850X (700 EUR), the Viking U20040 in theory gives me ~3500MB bandwidth and 16TB raw storage (with configurable redundancy!) at 750 EUR total cost, no single NVMe drive can come close to that deal, and it would be slower & more expensive with e.g. a 5-6x SATA breakout + SATA drives.

I don’t actually care about extreme high-end performance (I do have a windows VM on this rig for cloud gaming but that’s pretty much the only thing that can take advantage of higher than 1000MB/s drive access, my network runs at 2x2.5G out of this machine so that taps out around 600MB/s max)

Nice. Four 4TB NM790s is €1150 here, same as SN7100 and 990 Evo at the moment, and they’ve only occasionally gone to €1100. Four 4TB US75 is €930. I picked up two 8TB SN850Xes when they dropped to €625 each for a bit but they’re back up around 700 now.

We do overnight network copies and run locally, mainly, during the workday. IO varies widely with workload but 5 GB/s is slow for initial data ingestion. Also starting to move 20 Gb USB enclosures for transfer drives as we get more machines built with 20 Gb ports.

PCIe 5.0 x4’s too far into diminishing returns still for us. 7 GB/s data ingestion’s fast enough the potential time savings for 12-14 GB/s is only a few minutes and all of our current machines are likely to go core bound before 12 GB/s anyways. E26 drives run a 25% price premium, don’t think there are any E28 drives yet, and SM2508 or Presto is like double.

Easiest improvements for now are merging and rethreading ingestion operations to use drive bandwidth more efficiently.

The problem with PCIe Swicthes / PLX chips is that they are expensive (especially these days), they introduce latency, sap performance and use a bunch of power.

Best avoided unless you don’t have an option.

In the configuration you propose, you’d obviously be limited to a grand total of 4x Gen4 speeds shared between all four of the drives you propose using, and it will be slightly slower than that due to the added latency from the switch.

Sigh. I miss the days of being able to get Consumer+ (HEDT) systems with 40 PCIe lanes. The current 28 available in consumer platforms is nowhere near enough.

Where you will benefit - performance wise - from this setup is in conditions where the drives slow down and are unable to max out 4x Gen4, like with sustained writes.

Please do update after some testing: that seems like a good price for something with 4xM.2 slots and a PCIe switch chip. You might look for potential thermal throttling; the manufacturer says they certify certain drives for “thermal behavior”.

How did this go? I’m thinking of buying one of these. Perhaps PCIE 4.0x4 to use with an oculink…