Hi everyone!
I’ve been messing around with this board (X570D4U-2L2T) for a while now and I’ve run into a strange issue that I’m reasonably sure is a BIOS bug, and I’m a bit stumped on how to get any further.
I’m planning to use this board to build a li’l home server that’ll mostly be a virtualisation host. One of the VMs is going to act as our home router, running OpenWRT. I’m able to passthrough the two I210 NICs, and a VF of each of the X550s (leaving the PF on the host) without issue, and ACS seems to be working fine, given the board’s IOMMU groups seem to be what I’d expect.
Where I run into issues is trying to assign my wireless adapter (QNAP QWA-AC2600). This card’s a little weird in that it has two entirely independent QCA9984 wireless interfaces (in separate IOMMU groups) behind an ASM1182e PCIe switch (which is in a third independent IOMMU group). If I assign just one of the two QCA9984s to the VM, everything works just fine. Similarly, if I attach the second QCA9984 to the same VM at runtime, it initialises in the guest without issue and all is well. Likewise, both cards work just fine on the host. For reference, this is how it shows up in lspci
(on the host):
29:00.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
2a:03.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
2a:07.0 PCI bridge: ASMedia Technology Inc. ASM1182e 2-Port PCIe x1 Gen2 Packet Switch
2b:00.0 Network controller: Qualcomm Atheros QCA9984 802.11ac Wave 2 Wireless Network Adapter
2c:00.0 Network controller: Qualcomm Atheros QCA9984 802.11ac Wave 2 Wireless Network Adapter
However, I run into issues if both QCA9984s are attached to the VM (i.e. defined in the domain XML) when the VM is initially booted, or when it exits. In either case, the entire physical card (i.e. both wireless NICs and the PCIe switch on the card) seem to fall off the bus and end up in an unusable state until the host is rebooted (removing them and triggering a bus rescan does not bring them back), with the following dmesg
output:
Jun 01 13:33:39 safi-server kernel: pcieport 0000:2a:03.0: Unable to change power state from D0 to D3hot, device inaccessible
Jun 01 13:33:39 safi-server kernel: pcieport 0000:2a:07.0: Unable to change power state from D0 to D3hot, device inaccessible
Jun 01 13:33:39 safi-server kernel: pcieport 0000:29:00.0: Unable to change power state from D0 to D3hot, device inaccessible
Jun 01 13:33:39 safi-server kernel: pcieport 0000:29:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jun 01 13:33:39 safi-server kernel: pcieport 0000:2a:03.0: Unable to change power state from D3cold to D0, device inaccessible
Jun 01 13:33:39 safi-server kernel: pcieport 0000:2a:07.0: Unable to change power state from D3cold to D0, device inaccessible
This happens immediately upon trying to boot the domain (or when it exits, if the second card was attached to it at runtime), long before the guest kernel is started (I don’t think it even makes it to OVMF - I’m pretty sure qemu fails to actually start), so I don’t think it has anything to do with the guest OS.
I’ve also tested this selfsame wifi card on a different motherboard and this issue does not occur there, suggesting that it is not an issue to do with the wifi card itself.
Likewise, I’ve also tested on the X570D4U-2L2T itself with both my actual OS & kernel, and the same test environment I used on the other board (a standard Ubuntu live environment) and the issue occurs identically with both, seeming to rule out an issue specific to my kernel / userspace configuration.
At this stage it seems quite likely to be some sort of bug with the board’s BIOS (I’m running version 1.78
- 1.70
had issues running my RAM at 3200MHz, whereas 1.40
threw occasional MCEs and had spontaneous resets - the 1.78
BIOS hasn’t had either of these issues), but I’m at a loss for how to proceed. It could well simply be a configuration issue in my board’s BIOS, but I’m not sure what I’d be looking for exactly.