Gigabyte MZ32-AR0 bifurcation (and PCIe in general) is very buggy

I’m looking for someone with experience to help me deal with this piece of not very good tech.

I’ve got this motherboard with the goal of having a lot of PCIe lanes, especially because (at least according to BIOS) motherboard supports bifurcating each slot into x4 lanes. I want to connect a bunch of NVMe SSDs and some other stuff.

The challenge is that motherboard is very picky/buggy.

I have 3 Wi-Fi cards installed and they don’t always all work unless installed into specific slots relatively to each other. Otherwise they either disappear or prevent BMC (let alone BIOS) from booting. No idea why, BMC doesn’t provide any helpful debugging information aboutthis.

The other issue is with NVMe SSDs in x16 card that accepts 4 of them. I did at some point saw all 4 of them working, but now no matter which slot I install the card in motherboard doesn’t see them. When I remove Wi-Fi cards from the system this NVMe carrier card does light up for some reason (there is one activity LED for each SSD), but still motherboard doesn’t seem to see it.

I’ve spent probably about 10 hours total trying different combinations without success so far.

I have latest BIOS installed (initially for ver 1.0 motherboard, but then switched to ver 3.0 BIOS because they are compatible and newer BIOS contains security fixes). I even tried to reinstall CPU.

There is no hope for Gigabyte’s support since it is a bit worse than useless (doesn’t help, slow to respond).

Open to ideas (current best idea I have is to get rid of this board and buy something more adequate).

In terms of BIOS settings I tried turning on/off ARI, AER, ACS, SR-IOV.

Virtualization and IOMMU are hard requirements for me, so I didn’t try to disable those.

1 Like

Threadripper platforms in general seems to be a bit troublesome and Gigabyte “workstation/server” boards seems to be less than ideal in general. Have you tried lowering PCIe slots to like Gen 4 or Gen 3?

1 Like

This is an server platform, I use it with Epyc Rome 7302P processor.

I didn’t try to lower slots to Gen 4, primarily because all the devices I use there are Gen 3 or lower anyway, so it shouldn’t even try to negotiate Gen 4, right?

1 Like

It might have trouble with PCIe training which is could explain why devices randomly disappear.

1 Like

They actually don’t randomly disappear, they either work all the time or don’t work. I tried different permutations and I believe sometimes even the same permutation may work or not, but if it booted at least once with a particular config it continues to work going forward.

Last time I had to swap two Wi-Fi cards with each other because one of them wasn’t seen by the motherboard for whatever reason.

1 Like

Fwiw I had much better luck overall with pcie bifurcation setup in Asus brand cards than any other. If you have a lot of noise on any board it will be super buggy. This board but also when I did that 23 nvme video on an Asus board. One wonky connected pcie device and everything gets weird

1 Like

I can believe that, but they cost quite a bit and I don’t need PCIe 4.0 or anything fancy like that. About half of my devices are not even PCIe 3.0, so paying over $50 for a simple adapter seemed wasteful.

Any debugging tips in this case? I’ve tried to pull cards out one by one and found that thing with ordering of devices, but I don’t know why it might be happening. Also it takes a few minutes to post for each attempt, so very slow process.

Few minutes to post is almost always lots and lots of pcie handshake retries. Xlosts closer to the CPU likely work better. Even tho it’s not gen4 doesn’t mean gen3 will be trouble free.

Id start with the slot closest to the CPU. GPUs tend to have better built in signal drive capabilities so those being father away are ok

I tried decreasing link speed to Gen3 and even Gen2 and it didn’t help. I tried various slots, including those that are closer to CPU with the same result (though I must say the layout of the board is such that all slots are pretty close to the CPU).

I also ordered Asus Hyper M.2 X16 V2 (PCIe 3.0) for experiments and I was initially convinced it didn’t work because motherboard didn’t show anything in the BIOS in NVMe drives list (I did see my drives with the old cheap adapter with one of the permutations of cards), but when booted into OS I do see all 4 drives detected. Interestingly, on first boot motherboard saw only two drives, but after reboot two more showed up without me touching anything :man_shrugging:.

The only problem now (except money apparently wasted) is that this motherboard has a big flaw: only bottom 3 PCIe slots can support this ASUS card and only if no heatsink is installed on M.2 drive in motherboard’s slot. With all other slots longer cards like this collides with either memory modules or CPU waterblock.

Anything else I can experiment with?

Today I spent a few more hours with this motherboard. Basically I installed OCP 2.0 card recently and that replaced the need to have a few regular PCIe NICs. Interestingly, as I mentioned before about permutations, system didn’t boot anymore after this.

Specifically the issue seems to be with BMC, it doesn’t boot with some permutations of the PCIe devices and that causes the rest of the board to not boot either. I was eventually able to narrow it down to one device that I unplugged, waited for BMC to boot and then hotplugged it back.

As the result system booted and works, but unfortunately it will not start on cold boot next time, so I’ll have to do the same trick again and again.

I’m wondering if there is a way to somehow work around this in a nicer way? I feel like at this point I have tried all the options in the BIOS.

So many PCIe lanes being wasted :disappointed:

P.S. Tired to install that cheap PCIe bifurcation card in all possible slots and motherboard refuses to provide power for some reason :confused:

Upgraded main workstation to ASUS PRO WS TRX50-SAGE WIFI and those cheap bifurcation cards that Gigabyte MZ32-AR0 refuses to see work flawlessly there.

So yeah, Gibabyte motherboard is to blame as it often is the case for me :confused: