A Neverending Story: PCIe 3.0/4.0/5.0 Bifurcation, Adapters, Switches, HBAs, Cables, NVMe Backplanes, Risers & Extensions - The Good, the Bad & the Ugly

Didn’t know there was a 21H2 LTSC, think I’ll look at it, maybe it behaves differently than the regular 10 Enterprise 21H2.

Hope I get through the weekend without needing an exorcist:

Does somebody have more in-depth knowledge how to properly document as much information as possible from BSoDs?

My past personal experience handling BSoDs:

  1. BSoD appears with new component. Is the part at fault?

  2. Remove component → issue gone

  3. Check component for driver and firmware updates → issue gone, great

Now I’m at a point where Broadcom still doesn’t see anything wrong on their end so I’d like to give them as much evidence as possible.

@Illumous

Thanks again for more information!

Hope I didn’t overlook it: Have you ever had BSoDs with the P411W-32P when using it with Windows?

@aBav.Normie-Pleb

Officially, that version is called “Windows 10 Enterprise LTSC 2021”. Behavior wise that I am aware of, it is not that different from the standard 21H2 besides a longer patch support window (5 Years - The previous LTSC had a 10 yr window) and less garbage preinstalled.

There are pre-installation scripts around for customizing Windows 10.
The Original:

An updated fork that I’ve personally used:

As always, review code before use…

I think you just might need to hire someone with a background in exorcism… jk lmao lol :joy::rofl:
Really though, what all is in this pic? I see the P411W-32P in the top left… Case? Also, where’d you find your SAS expander? I’m interested in acquiring one.

I miss the days of Windows 7 and earlier BSODs where they were much more descript about what’s going on. Personally, I promptly add a registry key to all new installations to return some of the previous behavior.
CopyPasta the following and save as NameThisFileWhatEverYouWant.reg then execute the file.

Windows Registry Editor Version 5.00
[HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\CrashControl]
“DisplayParameters”=dword:00000001

Another place to look is Windows Event Viewer after the fact.

Yes! Only when attempting to install the latest drivers with the latest working firmware. Fails every time.
*Note: I learned a long time ago to never put Windows anything to sleep nor into hibernation because it’d cause way too many issues that would only be solved by a reboot. The odd exception to the rule is laptops… I haven’t had the nerve nor the desire to test the sleep/hibernate ability since.

Ugh… Time to go support agent surfing on my end…
Where did you start poking the bear?

As always, a pleasure… :grin:

2 Likes

2nd card from the left is the broadcom HBA 9500-16i which I also aquired. The SAS expander seems to be the Intel RES3TV360 (the good thing about it is that it doesn’t require a PCI slot just for power)

2 Likes

@Illumous

Some anecdotal experience regarding the differences in Windows versions that ought to be the same:

I never got SMB Multichannel (= Multiple ethernet connections between client and server are used even during a single file transfer doubling transfer speed effectively) to work on client Windows versions ever since 8 in 2012. Checked all the requirements but the two client Windows versions would just use a single ethernet path.

Just swapping out the boot drive and booting Server 2012, 2016, 2019 and with the exact same ethernet adapter settings SMB MC would just work between the two Windows Server systems.

Maybe other stuff is also subtly different…

That’s a Bingo and the exact reason I chose it, you can just install it anywhere in a case, not blocking a PCIe slot.

The system in the pic is my still experimental home server I intend to use with ZFS hopefully sometime soon.

My daily-driver home server is still a Windows system with 9361 hardware RAID that just works perfectly without BSoDs etc.

The intended specs of the new system:

  • 5900X

  • 128 GB ECC

  • 8 CPU PCIe lanes (PCIe x16 #1): Pure PCIe Switch HBA for 8 x U.2 NVMe SSDs

  • 8 CPU PCIe lanes (PCIe x16 #2): SATA/SAS HBA (currently testing the 9500-16i here because of PCIe Gen4), 8 x SATA SSDs directly connected to that HBA, two x4 SAS connectors get adapted to two separate external SAS3 expanders that in turn handle up to 24 x mechanical HDDs each

  • 8 PCIe lanes from X570 chipset (PCIe x16 #3, unique feature of the ASUS Pro WS X570-ACE): Intel XL710 2 x 40 GbE ethernet adapter

  • Chipset x1 PCIe slot: ASRock Rack PAUL IPMI dGPU

  • Case: SilverStone GD07

  • SSDs are directly located in the SilverStone case, traditional HDDs are in an old Lian Li cube case I’ve turned into a 24 x 3.5" disk shelf that just has a SAS3 expander and a PSU in it:

Back to the initial pic:

  • Broadcom P411W-32P (PCIe Switch NVMe-only HBA, PCIe Gen4)
  • Broadcom HBA 9500-16i (Tri-Mode HBA, PCIe Gen4)
  • 2 x Broadcom HBA 9400-8i8e (Tri-Mode HBA, PCIe Gen3)
  • SAS3 Expander Intel RES3TV360
  • Currently installed in PCIe x16 #1: Delock 90504 (PCIe Switch NVMe-only HBA, PCIe Gen3)

Regarding my Broadcom support case:

  • Gave them an update that the replacement P411W-32P is behaving the exact same way (BSoD with latest driver but not the latest firmware, latest firmware without any function) they just shrugged it off that I should test it with different platforms

Wendell is currently also looking at Broadcom, he built a NAS for Gamers Nexus with a 9400-16i some time ago and there are serious issues, although I don’t know the exact scope of it:

I asked him to throw a bit of shade at Broadcom and he did in the latest Moore’s Law Is Dead podcast 157 he was a guest on (go to 1:45:41) :wink:

After seeing how engaged Broadcom is in addressing the issues I’ve been encountering I think the only way to achieve something here is mounting something like public pressure even if this is somewhat difficult with such a niche topic… :frowning:

I try to engage with comments on some YouTube channels that focus on DIY servers and HBAs but unfortunately I seem to have angered YouTube’s machine learning algorithms and many of my comments vanish after a short time period :frowning:

2 Likes

How are the prospects in using a M.2 slot to connect a PCIe 4.0 U.2 and U.3 these days? I’ve seen some adapter from Delock for PCIe 3.0 and some PCIe cards of questionable origin. Are there good options without going for an expensive tri-mode HBA? I’d like to get a Micron MAX U.3 for my homeserver and I also consider options for a future daily driver. While the homeserver is totally fine with PCIe 3.0, my future workstation needs full PCIe 4.0 capabilities.

  • There are no “elegant” solutions at the moment. When I’m using the term “elegant” with PCIe I mean functionality without any PCIe Bus Errors at all;

  • Note: Most systems have PCIe Advanced Error Reporting disabled by default or not even accessable in the UEFI;

  • On AMD systems AER only works on CPU PCIe lanes, not chipset ones;

  • If PCIe AER is disabled chances are you will never be aware that there might be any PCIe Bus Errors since Gen4 SSDs work with their expected performance;

  • I’ve had the least amount of PCIe Bus errors with a Delock 62721 and an old SFF-8643-SFF-8643 SAS3 cable and Icy Dock’s first-gen U.2 NVMe backplane ToughArmor MB699VP-B;

  • Another route is surpressing PCIe signal issues with an active M.2 adapter but these are rather masking potential cable issues which I don’t like very much :frowning:

  • Also, the only adapter I know of and have tested so far is from Micro SATA Cables and the cable connector is facing the “wrong” way making cable routing much more difficult if you have your regular PCIe slots occupied;

5 Likes

That Delock adapter looks promising and Delock is EU based. For a server that red PCB isn’t a problem. But for a consumer board paired with a case with window (dodging that stuff as much as I can). And then the giant M.2 heatsink shrouds. I guess I have to plan more carefully. Delock also has that fitting SFF-8643 > SFF-8639 + SATA power cable which is “compatible with PCIe 4.0”.

The Micro SATA Cables stuff doesn’t seem to have distribution in europe and I really hate dealing with customs and all that mess if I can avoid it. But green PCB+re-driver seem like a more quality solution.

Thanks so far. I think I can make this fly. Micron 7400 MAX 800GB as SLOG for server, 2x Micron 7450 Pro for my workstation. Always assuming backwards-compatibility of U.3 to U.2 works and adapter+cable does too.

1 Like

Just a word of caution: You shouldn’t trust the labels Delock’s putting on their cables that much - don’t ask me how I know :wink:

They just slap their logo on stuff from the usual asian OEMs, they don’t really “develop” products.

When you test various stuff you start to see “old friends” from various brands like StarTech and Delock. That doesn’t have to be a bad thing but after realizing that you see the world in a slightly darker light :wink:

I trust corporations as much as my hard drive: useful and replaceable . I have a background in sales and if it doesn’t work, Reichelt or wherever I buy the stuff from gets their stuff back. Reichelt is 20 min car drive away, but I never had any trouble sending stuff back.

Try to test your component configuration within the 14 day return window.

Arguing with a manufacturer’s Level0 tech support or a retailer about PCIe Bus Errors to get a refund (“What do you mean the cable’s quality is bad, the SSD shows up in Windows as PCIe Gen4 so everything is fine!”) is an experience no one has to have…

2 Likes

Yo, yo, yo - let’s hear it for all the P411W-32P homies in da house!

chika_rap

A new driver P23 seems to be released soon, file cannot be found yet :frowning:

Bets are still open, do you think Broadcom addressed the BSoDs even though “they never found anything wrong in their testing lab”?

But the shitty P14.3 firmware that completely breaks functionality is still there :frowning:

2 Likes

Good luck :slight_smile:
I’ve upgraded the 9500-i16 to firmware/driver P23 a while ago. No problems so far under linux

And 2 days ago I finally opened my box again to investigate the problem why my old LSI card didn’t work anymore since I’ve added the 9500-i16. Well it turned out that when I installed the new card and did some reorganising of all the PCIe cards and cabling I somehow managed to insert the old card wrongly into the PCIe slot. Since it’s only an 8x card It could put it in the back of the PCI slot :slight_smile:

I felt very stupid. :slight_smile:
Even dumber I managed to put it in wrong twice, because the case is very crowded and lighting was bad, I only suddendly realized that the card doesn’t start and end at the same place like the other card. I got the old card 2nd hand and it only had a low profile slot “Blende” so I run it without one. So that’s why I could put it in wrongly :slight_smile:

Now both cards are working.

Don’t know if I should laugh or cry:

2 Likes

Are you still running the previous firmware (Idr the version number off of the top of my head :sweat_smile::joy:)?

I’m using the second-newest firmware version P14.2 on the P411W-32P.

Since Broadcom displays 3 different (!) versions for the very same firmware file for example, I’ll always use the version from the original download package file name.

BTW: On their download site they state the version for the firmware P14.3 to be 4.1.3.1, Broadcom’s G4xFLASH tool on the other hand reads the installed firmware version as 0.1.3.1.

Could it be that some stupid typo in the firmware source file disrupts the proper firmware version naming scheme and maybe as another side effect causes unspecific bugs?

Firmware P14.3 still has not any functionality, SSDs don’t show up during POST and also not in the OS. Came back to it and checked with the “latest” drivers with the certificate issue in Device Manager, was thinking that maybe the P411W-32P would finally work with the latest drivers from within the OS and that potentially only boot support remained broken.

But I can at least boot with the drivers P23 with different firmware versions installed, drivers P18 caused a BSoD during boot with any other firmware than P14.3 (which is funny since this firmware is completely broken).

Can you properly use the drivers P23 in Windows or do you get the same certifacte error message as me?

Has anyone here tried the pcie 4.0 slimsas redrivers and breakouts made by Christian Peine? I stumbled on them last week but can’t justify the cost of testing them right now. https://c-payne.com/collections/slimline-pcie-adapters-host-adapters

On a separate note, does anyone know of an M.2 to pcie 4.0 x4 adapter? I need to run a 3080 off an M.2 slot on my trx40 motherboard and shipping the adt-link m.2 to pcie x16 4.0 risers takes a while.
If nothing else I’ll probably test one of the mining m.2 to x4 cards, but don’t have much hope for them at gen 4 speeds.

I’ve received this Delock Produkte 64106 Delock M.2 Key M zu 1 x OCuLink SFF-8612 Konverter but I haven’t had time yet to try it out. I can let you know once I’ve tested it. Though my test will be with a U.2 SSD.

UPDATE: A first test wasn’t successfull. It didn’t even recognize the drive at gen3 speeds.

Since I don’t know what specifically to test next I will probably look at this dude’s goods next. The big players have failed so that would only be logical.

Hardware:

  • Motherboard: GigaByte MZ72-HB0
  • CPU: 2x EPYC 7443
  • Drives: 4x Intel D5-P5316 u.2 PCI-e 4.0
  • PCI HBA: HighPoint SSD7580A (4 SlimSAS 8i ports)
  • Cables: 2x HighPoint Slim SAS SFF-8654 to 2x SFF-8639 NVMe Cable for SSD7580

I had a bit of journey (explained below) but ultimately I could
achieve a bandwidth of 28.4 GB/s on a Linux md raid0 array and sequential read with fio. The key insight is on Epyc (and Threadripper?) you may want to change the nodes per socket (NPS) in the BIOS.

fio Results:

# /opt/fio/bin/fio --bs=2m --rw=read --time_based --runtime=60 --direct=1 --iodepth=16 --size=100G --filename=/dev/md0 --numjobs=1 --ioengine=libaio --name=seqread
seqread: (g=0): rw=read, bs=(R) 2048KiB-2048KiB, (W) 2048KiB-2048KiB, (T) 2048KiB-2048KiB, ioengine=libaio, iodepth=16
fio-3.30
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=26.5GiB/s][r=13.6k IOPS][eta 00m:00s]
seqread: (groupid=0, jobs=1): err= 0: pid=10378: Wed Jul 27 09:30:41 2022
  read: IOPS=13.5k, BW=26.4GiB/s (28.4GB/s)(1587GiB/60001msec)
    slat (usec): min=32, max=578, avg=63.60, stdev=23.46
    clat (usec): min=264, max=4962, avg=1117.58, stdev=79.42
     lat (usec): min=328, max=5369, avg=1181.28, stdev=79.04
    clat percentiles (usec):
     |  1.00th=[  963],  5.00th=[ 1057], 10.00th=[ 1074], 20.00th=[ 1090],
     | 30.00th=[ 1090], 40.00th=[ 1106], 50.00th=[ 1123], 60.00th=[ 1123],
     | 70.00th=[ 1123], 80.00th=[ 1139], 90.00th=[ 1139], 95.00th=[ 1172],
     | 99.00th=[ 1434], 99.50th=[ 1598], 99.90th=[ 2024], 99.95th=[ 2114],
     | 99.99th=[ 2343]
   bw (  MiB/s): min=23616, max=27264, per=100.00%, avg=27101.38, stdev=407.26, samples=119
   iops        : min=11808, max=13632, avg=13550.69, stdev=203.63, samples=119
  lat (usec)   : 500=0.01%, 750=0.12%, 1000=1.73%
  lat (msec)   : 2=98.02%, 4=0.11%, 10=0.01%
  cpu          : usr=1.45%, sys=88.36%, ctx=338808, majf=0, minf=8212
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=812351,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=26.4GiB/s (28.4GB/s), 26.4GiB/s-26.4GiB/s (28.4GB/s-28.4GB/s), io=1587GiB (1704GB), run=60001-60001msec

Disk stats (read/write):
    md0: ios=13175835/0, merge=0/0, ticks=8504656/0, in_queue=8504656, util=99.93%, aggrios=3249404/0, aggrmerge=50764/0, aggrticks=2083607/0, aggrin_queue=2083607, aggrutil=99.85%
  nvme3n1: ios=3249404/0, merge=203059/0, ticks=2283093/0, in_queue=2283093, util=99.85%
  nvme0n1: ios=3249404/0, merge=0/0, ticks=2059957/0, in_queue=2059957, util=99.85%
  nvme1n1: ios=3249404/0, merge=0/0, ticks=1889297/0, in_queue=1889297, util=99.85%
  nvme2n1: ios=3249404/0, merge=0/0, ticks=2102083/0, in_queue=2102083, util=99.85%

Initial testing of this setup I was only able to achieve between 7 to 9 GB/s depending on setup/options, each individual drive was able to achieve 6.9 GB/s on it’s own so it was hitting a
bottleneck somewhere.

What was also annoying was I had asymmetric direct IO access speeds, 7GB/s from
one CPU socket and 1GB/s from the other socket. I expect some dip but this was
more than I expected. The OS was running on a Gen4 x4 m.2 and has similar access
speeds from either socket.

I contacted HighPoint support and they were helpful. There was some back and
forth with of “is this plugged into x16 slot?”, “are use pinning the process
to a given NUMA node?”
, “are our drivers installed?” etc. It turns out the magic
option was to set the Nodes Per Socket (NPS) to NPS4 in the BIOS. This means I
have a NUMA node per core complex (4 per socket 8 total). I had found some info
on setting the NPS in Epyc tuning guides but no mention of it’s impact PCI-e
bandwidth. We have another machine with 2x Epyc 7302 (PCI-e 3.0)
and HighPoint SSD7540 with 7x m.2 drives. Previously I could only get 6GB/s
which I presumed was some bottleneck/inefficiency on using an odd number of
drives. After setting the NPS4 in the BIOS on this it bumped up to 13.3GB/s.

If anyone knows why the PCI-e gets bottlenecked when NPS1 (default) is used I
would be curious. My guess is to make the multiple core complexes appear as a
single node per socket the infinity fabric uses some lanes, but this is
pure speculation. There is an old post by Wendel on “Fixing Slow NVMe Raid Performance on Epyc”, I didn’t want to bump but I wonder if the nodes per socket would impact this also.

Back to the Epyc 7443 system here are the bandwidth numbers when pinning to
different CPUs (fio --cpus_allowed=<ids>).

The drives are connected on NUMA Node 1

$ cat /sys/block/nvme[0-3]n1/device/numa_node
1
1
1
1

Curiously I get best results if I allow any CPU on Socket 0 rather than
just the node the drives are connected to.

Cores: 0-23: (Socket 0, NUMA nodes 0,1,2,3)
   READ: bw=26.3GiB/s (28.3GB/s), 26.3GiB/s-26.3GiB/s (28.3GB/s-28.3GB/s), io=1580GiB (1696GB), run=60001-60001msec
Cores: 24-47 (Socket 1, NUMA nodes 4,5,6,7)
   READ: bw=21.4GiB/s (23.0GB/s), 21.4GiB/s-21.4GiB/s (23.0GB/s-23.0GB/s), io=1283GiB (1377GB), run=60001-60001msec
Cores: 6-11: (Socket 0, NUMA node 1)
   READ: bw=24.8GiB/s (26.6GB/s), 24.8GiB/s-24.8GiB/s (26.6GB/s-26.6GB/s), io=1486GiB (1596GB), run=60001-60001msec

We went with the HighPoint HBA as we were originally going to use a Supermicro H12. The Gigabyte MZ72-HB0 motherboard actually has the ability to plug 5x SlimSAS 4i in directly, although this is 2 on Socket 0 and 3 on Socket 1. The motherboard manual is quite sparse on details but I can confirm all the SlimSAS slots are 4i although only 2 are labelled as such. It is quite hard to find information of the SFF connector spec/speeds but my understanding is MiniSAS is 12Gb/s per lane and 4 lanes (so 6 GB/s), SlimSAS is 24Gb/s per lane (so 12 GB/s). My understanding is therefore you should NOT use MiniSAS bifurcation for Gen4 SSDs if you want peak performance.

I found this forum thread before I set the nodes per socket in the BIOS and decided to order and try some SlimSAS 4i to u.2 from Amazon (YIWENTEC SFF 8654 4i Slimline sas to SFF 8639). These were blue-ish cables and given previous mentions of poor quality (albeit different connector types I think) I didn’t have much hope. However testing these cables I saw expected speeds and no PCI-e bus errors YMMV if connecting via a bifurcation card. DeLock seems to be the only option there but hopefully there are more possibilities in future, particularly with Gen5 incoming.

Edit: You need to change the BIOS settings for 3 of the SlimSAS 4i from SATA to NVME.

5 Likes

Just an itsy bitsy teenie tiny update since there are neither new, fixed drivers nor a firmware for the BSoDcom P411W-32P.

Since Optane has been killed, I’ve been looking at other NVMe storage that might be a nice SLOG/L2ARC etc. volume but isn’t extremely expensive.

The Micron 7450 with 3.84 TB or larger (the larger capacity models are faster) looks good and doesn’t break the bank compared to Optane P5800X models, for example. If you just use 2 TB instead of the 3.84 TB as a max capacity the NAND should last much longer than the warranted TBW.

3 Likes