Bifurcation Issues

Dr.ShrimpPuertoRico · May 25, 2023, 7:23pm

So I finished building a server with the following relevant specs:

Motherboard: ASRock X570 Steel Legend (BIOS P5.01)
CPU: AMD 5700G
4-Port M.2 NVMe Adapter: Amazon link
NVMe: Tried both a ZP1000GM3A023 & ZP1000GM3A013
OS: TrueNAS 22.12.2

Trying to figure out if I could get x4x4x4x4 bifurcation from the x16 slot before the purchase is a bit of a crapshoot, but the CPU does provide 16 lanes for the x16 slot (unlike some previous APUs which seems to have caused confusion among some folks and first-line support), so I went for it.

Unfortunately, the BIOS only lists auto & 2x4 for the PCIe/GFX Lanes Configuration setting:

auto: only detected 1 or 2 drives if I recall
2x4: to my surprise this seems to run in x8x4x4 mode, as 3 of the 4 drives in the PCIe card are detected. I don’t know if that’s intended and I’m misunderstanding what 2x4 means?

I was content with 3 drives, but sadly the NVMe drive in the x8 part of the card seems to randomly get removed by TrueNAS. It used to happen every few days, but managed to last a month this time. This could be unrelated to the BIOS/Motherboard, but since it’s only happening on the mystery slot even with different drives, I’m a bit suspicious.

I’ll try the TrueNAS forums and probably shoot some emails to ASRock/AMD too, but figured I’d ask here to see if anyone’s encountered anything like this before, or has a few ideas I could try to resolve this.

If reattaching the drive without a reboot is possible, that’d be great too.

PCIe/GFX Lanes Configuration Description

This is the description for the setting in the BIOS. I’m pretty confused by this to be honest

Configure J10 & J3600 Slot PCIe
Lanes. Auto - If J3600 Slot is
connected device, J10 and J3600 both
are x8, otherwise J10 is x16; x8x4x4
- J10: X8, J3600: 4x4; x4x4x4x4 -
J10: x4x4x4x4 (J3600 Slot can't
connect any device).

Errors

CRITICAL
Pool SHODAN state is DEGRADED: One or more devices
has been removed by the administrator. Sufficient
replicas exist for the pool to continue functioning in a
degraded state.
The following devices are not healthy:
• Disk Seagate FireCuda 530 ZP1000GM30013 is REMOVED

[16901.207108] nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
[16901.303103] nvme 0000:0d:00.0: enabling device (0000 -> 0002)
[16901.303663] nvme nvme1: Removing after probe failure status: -19
[16901.331131] nvme1n1: detected capacity change from 1953525168 to 0
[16901.331136] blk_update_request: I/O error, dev nvme1n1, sector 259144416 op 0x1:(WRITE) flags 0x0 phys_seg 11 prio class 0
[16901.331139] blk_update_request: I/O error, dev nvme1n1, sector 259144656 op 0xl:(WRITE) flags 0X0 phys_seg 8 prio class 0
[16901.331149] zio pool=SHODAN vdev=/dev/disk/by-partuuid/59020ebe-79db-4852-b3d9-adaecce835e0 error=5 type=2 offset=130534514688 size=131072 flags=40080c80
[16901.331193] zio pool=SHODAN vdev=/dev/disk/by-partuuid/59020ebe-79db-4852-b3d9-adaecce835e0 error=5 type=2 offset=130534645760 size=131072 flags=40080c80

jode · May 26, 2023, 4:19am

Unfortunately, AsRock doesn’t document its bifurcation support (well).

The description from the BIOS is comically bad.

J10 is the internal name for first PCIe slot (PCIe 1 in the manual)
J3600 is the internal name for the second slot (PCIe 4 in the manual)
First sentence suggests that if a card is in the second slot, both slots operate with 8 lanes. That would be counter to the manual which suggests that these slots operate at 16x/4x.

image1000×1041 112 KB

Then it suggests that the BIOS/UEFI allows bifurcation configurations of

x16 (slot 1) (nothing in slot 4)
x8 (slot 1), x8 (slot 4)
x8x4x4 (slot 1) (nothing in slot 4)
x4x4x4x4 (slot 1) (nothing in slot 4)

I could not find BIOS/UEFI documentation to back that up.

This CPU is likely the limiting factor.

ASUS (IMHO) still has the best documented bifurcation support.
If on this unwieldy page find the chart for the X570 chipset boards, you will see when comparing data in the columns for Ryzen 5000 series CPUs (1st data column) with the columns for 5000G CPUs (third data column), that the bifurcation support is limited. (max 4x m.2 cars in slot 1 compared to 3x m.2 cards with 5000G).

This is due to the fact that some PCIe lanes of the CPU are reserved to interface with the iGPU. In turn these are missing when trying to connect m.2 drives via bifurcation.

These limitation obviously apply to Asrock mobos as well.

Dr.ShrimpPuertoRico · May 26, 2023, 4:51pm

Yeah their BIOS description is pretty garbage. I’m currently running with x8x4x4 (slot 1 - NVMes), x4 (slot 4 - HBA), which matches the manual somewhat, but not the BIOS description.

4x4 bifurcation

Looks like you’re right. It seems that while the CPU does supply 16 lanes, 4x4 bifurcation is not supported for APUs

From Gigabyte
Not my motherboard vendor, but same difference

The CPU with integrated graphics doesn’t support PCIEx4x4.
Why the onboard graphics disables 4x4? You may check with AMD. It’s their CPU limitation.

From AMD

Thank you for the email.
I checked the information internally and got an update that 4x4 is not supported with AMD Ryzen 5750GE processor or APUs.
Thank you for contacting AMD.

Source: community.amd.com thread

x8x4x4

Still not sure why ASRock, refuses to display this mode in the bios, yet somehow has flaky support for it anyway (but only on 2x4 mode, not auto)? Seems like a bug on their end tbh. Especially since other motherboard vendors support 3-way bifurcation.

I haven’t managed to figure out why it occasionally disconnects, I’ll see what ASRock get’s back with, but I did manage to make it a less painful issue by messing with some new commands , that seem to bring the drive back online without needing to reboot. As always, thanks for the help!

# ls /sys/class/nvme/
nvme0@  nvme1@  nvme3@  nvme4@

# sudo lshw | grep UNCLAIMED -A 11
*-storage UNCLAIMED
     description: Non-Volatile memory controller
     product: Seagate Technology PLC
     vendor: Seagate Technology PLC
     physical id: 0
     bus info: pci@0000:0f:00.0
     version: 01
     width: 64 bits
     clock: 33MHz
     capabilities: storage pciexpress msix msi pm nvm_express cap_list
     configuration: latency=0
     resources: memory:fce00000-fce03fff

# sudo sh -c "echo 1 > /sys/bus/pci/devices/0000\:0f\:00.0/remove"
# sudo sh -c "echo 1 > /sys/bus/pci/rescan"

# ls /sys/class/nvme/
nvme0@  nvme1@  nvme2@  nvme3@  nvme4@

ls /sys/class/nvme/ = check which NVMe drive was removed
sudo lshw = get the port for the removed drive from bus info
echo 1 > /sys/bus/pci/devices/$port/remove = remove the drive
echo 1 > /sys/bus/pci/rescan = find and reattach the drive

system · February 24, 2024, 10:52am

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.