X470 Taichi on-board USB controller passthrough

c1742243 · December 14, 2019, 2:40pm

TL;DR

Has anyone with an ASRock X470 Taichi (or any X470 board, for that matter) managed to successfully pass through an on-board USB controller to a guest VM without using the ACS kernel patch? If so, how did you go about doing it?

Due diligence

System:

Ryzen 3700X
X470 Taichi
BIOS 3.77 (beta) with AGESA Combo-AM4 1.0.0.4 Patch B
Linux kernel 5.3 (vanilla Ubuntu 19.10, no ACS or PCI 127 bug patches)

By following the steps in various guides (primarily the Pop!_OS How-To), and much trial and error, I’ve managed to cobble together a functioning Windows 10 VM with GPU pasthrough setup. Now that a beta BIOS is available for this board containing AGESA Combo-AM4 1.0.0.4 Patch B (which addresses IOMMU group separation and Unknown PCI header type ‘127’ issues), I’m attempting to get passthrough of a USB controller working as well.

Relevant IOMMU groups:

IOMMU Group 18:
    02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43d0] (rev 01)
    02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset SATA Controller [1022:43c8] (rev 01)
    02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Bridge [1022:43c6] (rev 01)
    03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    03:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 400 Series Chipset PCIe Port [1022:43c7] (rev 01)
    04:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM2142 USB 3.1 Host Controller [1b21:2142]
    05:00.0 SATA controller [0106]: ASMedia Technology Inc. ASM1062 Serial ATA Controller [1b21:0612] (rev 02)
    06:00.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1184e PCIe Switch Port [1b21:1184]
    07:01.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1184e PCIe Switch Port [1b21:1184]
    07:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1184e PCIe Switch Port [1b21:1184]
    07:05.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1184e PCIe Switch Port [1b21:1184]
    07:07.0 PCI bridge [0604]: ASMedia Technology Inc. ASM1184e PCIe Switch Port [1b21:1184]
    08:00.0 Network controller [0280]: Intel Corporation Dual Band Wireless-AC 3168NGW [Stone Peak] [8086:24fb] (rev 10)
    0a:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
    0b:00.0 USB controller [0c03]: Renesas Technology Corp. uPD720202 USB 3.0 Host Controller [1912:0015] (rev 02)

IOMMU Group 24:
    11:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]

Before attempting to mess around with passing through any of the on-board USB controllers, I tried using a dedicated Sedna 2 port card (with @FurryJackman tested uPD720202 controller). The card itself played well with the host system, but no matter which non-GPU slot I put it in—either of the x1 slots or the bottom x8 slot—it would show up in group 18 along with nearly every other device attached to the chipset.

This left the USB controller sitting all by itself in group 24 the prime candidate for passthrough.

My first try was to simply add the PCI device to the VM in virt-manager and hope for the best. Shortly after attempting to boot the VM, the host system became unresponsive and required a hard reboot. Less than an ideal situation. At this point, I rolled up my sleeves and took a deep dive into kernel driver binding.

In its default state, the USB controller is bound to the xhci_hcd driver.

$ sudo lspci -s 11:00.3 -nnkv
11:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c] (prog-if 30 [XHCI])
        Subsystem: ASRock Incorporation Matisse USB 3.0 Host Controller [1849:7914]
        Flags: bus master, fast devsel, latency 0, IRQ 67
        Memory at f7800000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [64] Express Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable+ Count=8 Masked-
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [2a0] Access Control Services
        Capabilities: [370] Transaction Processing Hints
        Kernel driver in use: xhci_hcd

My assumption is that it should be bound to vfio-pci like the graphics card, so I added the device ID to the /etc/initramfs-tools/scripts/init-top/bind_vfio.sh script used in the Pop!_OS guide. Turns out this method only works for dynamically loaded kernel modules; xhci_hcd is built into the kernel.

$ grep xhci /lib/modules/$(uname -r)/modules.builtin
kernel/drivers/usb/host/xhci-hcd.ko
kernel/drivers/usb/host/xhci-pci.ko

Some manual manipulation later (before scripting a more elegant solution)…

$ echo '0000:11:00.3' | sudo tee /sys/bus/pci/devices/0000\:11\:00.3/driver/unbind
$ echo '1022 149c' | sudo tee -a /sys/bus/pci/drivers/vfio-pci/new_id

…and the controller was bound to vfio-pci

$ sudo lspci -s 11:00.3 -nnkv
11:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c] (prog-if 30 [XHCI])
        Subsystem: ASRock Incorporation Matisse USB 3.0 Host Controller [1849:7914]
        Flags: fast devsel, IRQ 67
        Memory at f7800000 (64-bit, non-prefetchable) [size=1M]
        Capabilities: [48] Vendor Specific Information: Len=08 <?>
        Capabilities: [50] Power Management version 3
        Capabilities: [64] Express Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [c0] MSI-X: Enable- Count=8 Masked-
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Capabilities: [2a0] Access Control Services
        Capabilities: [370] Transaction Processing Hints
        Kernel driver in use: vfio-pci

Attempting to start the VM with the USB controller attached as a PCI device brought the system down again. syslog shows a number of events around the time of the crash, most interesting seem to be these (right before a series of ^@ garbage characters)

kernel: [10358.050360] vfio-pci 0000:11:00.3: not ready 1023ms after FLR; waiting
kernel: [10360.094937] vfio-pci 0000:11:00.3: not ready 2047ms after FLR; waiting
kernel: [10363.166855] vfio-pci 0000:11:00.3: not ready 4095ms after FLR; waiting
kernel: [10368.286832] vfio-pci 0000:11:00.3: not ready 8191ms after FLR; waiting
kernel: [10377.503036] vfio-pci 0000:11:00.3: not ready 16383ms after FLR; waiting
libvirtd[2175]: Cannot start job (query, none, none) for domain win10; current job is (async nested, none, start) owned by (2187 remoteDispatchDomainCreate, 0 <null>, 2187 remoteDispatchDomainCreate (flags=0x0)) for (33s, 0s, 33s)
libvirtd[2175]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreate)
kernel: [10395.678823] vfio-pci 0000:11:00.3: not ready 32767ms after FLR; waiting
libvirtd[2175]: Cannot start job (query, none, none) for domain win10; current job is (async nested, none, start) owned by (2187 remoteDispatchDomainCreate, 0 <null>, 2187 remoteDispatchDomainCreate (flags=0x0)) for (63s, 0s, 63s)
libvirtd[2175]: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainCreate)

The last thing I’ve tried is changing the “managed” attribute for the USB controller’s “hostdev” entry in the VM’s domain XML to “no”. I.e.

<hostdev mode='subsystem' type='pci' managed='no'>
  <source>
    <address domain='0x0000' bus='0x11' slot='0x00' function='0x3'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
</hostdev>

The documentation states that

For PCI devices, when managed is “yes” it is detached from the host before being passed on to the guest and reattached to the host after the guest exits.

The description for “no” is a little more complicated and I don’t understand it completely. But I take it to mean that you are responsible for detaching/reattaching the device from the host system. Having manually bound the USB controller to vfio-pci, I thought I’d already met this requirement. Powering on the VM yet again caused the host to crash.

Any insights into what’s happening here, or what I can do to get this working, are greatly appreciated.

iwouldntknow · December 14, 2019, 6:08pm

I have taichi x470 with beta bios, and didn’t had any problems passing usb controller even before beta bios with PCI quirk patch. Didn’t even need to recompile kernel for
xhci-hcd.
At VM start vfio-pci takes the device and after shutdown it release it.

Have you had success passing other devices? For example GPU or HD Audio Controller?

2bitmarksman · December 14, 2019, 6:25pm

I’m unsure on the beta bios. I run an x470 taichi ultimate board with the P3.30 bios atm with ESXi. There should be a USB controller on the board that has direct lanes from the CPU (On the taichi ultimate, these ports the ones at the top of the back of the motherboard). Finally, at least for ESXi, I had to setup the USB controller to use d3d0 as its PCIe reset method instead of the legacy FLR reset method; I couldn’t restart the VM and USB devices would occasionally hitch with FLR and d3d0 fixed it

FurryJackman · December 15, 2019, 12:36am

X470 inherently has the IOMMU lumping issue for the chipset, and it’s nowhere near fixed. X570 doesn’t have this issue.

If you plan to boot off of a non primary slot GPU, I recommend Gigabyte boards because they are the only ones to consistently have a boot GPU picker option in the BIOS.

If you want to stick to X470, use a NVMe to PCI-E riser to put a x1 into a NVMe slot going directly to the CPU, this is what I’d recommend, to find the block diagram and use the NVMe slot going to the CPU. Otherwise, do a motherboard swap to X570 and the grouping issue goes away for the chipset.

Do not use the ASmedia controllers because it’s been a mixed bag for those controllers working in passthrough.

c1742243 · December 15, 2019, 7:14am

Yes, GPU passthrough works without issue. Other accounts I’ve found online also suggest that managed='yes' automatically handles driver rebinding for USB controllers, which is why I’m confused as to why I’m experiencing issues with my setup.

A couple of questions for you:

Which particular USB controller are you able to pass through? Can you please post your IOMMU groups?
Which distribution and kernel version is the host OS running?
By “beta BIOS”, I assume you mean 3.77 containing “AGESA Combo-AM4 1.0.0.4 Patch B”. Can you please confirm.
USB controller passthrough was something I never got around to testing until now because GPU passthrough was always a bigger issue. When you had it working before, do you recall which BIOS version you were using?

c1742243 · December 15, 2019, 7:35am

I’m pretty sure the X470 Taichi non-Ultimate has the same CPU-attached USB controller. In my setup, it appears in group 24 by itself (the giveaway being Matisse USB 3.0 Host Controller rather than the other AMD and ASMedia controller trapped in group 18). Can you confirm it’s the same PCI vendor:product code as mine.

IOMMU Group 24:
	11:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]

Through process of deduction (a.k.a. inserting/removing USB devices into every port on the motherboard and monitoring lsusb), this controller appears to be connected to the first 4 ports on the back of the motherboard. When passing through this controller to the VM, but before powering the VM on, I ensure nothing is physically plugged into the ports. Still, this seems to make no difference.

I’m not familiar with the difference between these two methods, nor how to change from one to the other under a Linux setup. From what your describing, it addresses issues after you’ve successfully passed through a controller to the guest VM at least once. In which case, it’s currently a moot point for me. Still, good to know for future reference.

I can live with most of the IOMMU lumping issues, as long as I can get this seemingly single, isolated controller to pass through without issue. Anecdotal evidence from people here and elsewhere suggest that it’s possible and indeed a non-issue. I’m just hoping to get to the same point.

For a number of other reasons, I’m pretty much committed to this particular X470 board. So I’m making do with what I have. Good suggestion with the NVMe to PCIe riser card, but I’m already making use of the CPU-connected m.2 slot for an NVMe SSD.

2bitmarksman · December 15, 2019, 7:29pm

Look for the USB controller in the same IOMMU group as the encryption module. This is the one you need to passthrough. The vendor ID for this USB controller on my Taichi Ultimate starts with 1022

As for d3d0 instead of FLR, I don’t know if this is possible with KVM/QEMU. I just know I had to do this on ESXi to get it to release on VM restart or else it’d hang, among other things such as limited the USB speed to 2.0 (even though its a USB 3.0 port). I’d be deeply interested in how to get around this in KVM/QEMU

FurryJackman · December 16, 2019, 7:43am

If you have to use the CPU M.2 slot for an SSD, then there’s no other way other than the ACS patch.

QXC · December 16, 2019, 3:37pm

I’ve had basically the same problem with my Asus Prime X470 Pro recently. Updated bios to AGESA 1.0.0.4, USB hub I was using changed address and now hangs the host system when I try to fire up qemu with it as a device. No problem changing the driver to the vfio-pci, or even switching it back and forth between that and xhci… or booting with it bound to vfio even. Just doesn’t want to attach to the VM.

I believe I was on the 1.0.0.3 ABBA bios before which didn’t boot on ryzen 3000 (might have been earlier, didn’t check just threw the old CPU in and updated) and I don’t believe my board can update from a non booting bios without switching out CPUs.

If X570 actually splits out the PCIe slots into their own groups I might just upgrade the board. Need to research that today, unless someone figures out this magic d3d0 reset thingamadoodle.

EDIT: And I don’t think the ACS patch would make a difference in this case… maybe I’ll give that a try tonight.

springl0aded · December 18, 2019, 12:19pm

Yeah, its one of the limitations of the 400 series platform. There are workarounds but those are his best options.

The only other things I think one can do are install the USB card in the middle GPU slot, not the bottom or maybe the matisse USB controller will work better in a future kernel release.

FurryJackman · December 18, 2019, 1:01pm

Don’t use the 3.1 Gen 2 ports or ASmedia ports is basically the guideline. If it says the subsystem vendor ID is ASmedia, that will not work properly in passthough. Only the AMD subvendor ID ones work.

thro · December 19, 2019, 12:39am

I bought a USB controller card for $20 (australian pesos) to avoid the issue entirely

QXC · December 21, 2019, 7:49am

The solution was to ACS patch the kernel and pass a PCIe card. Since OP is using ubuntu 19.10 (like me) I’ll post what I did here, even if OP doesn’t wanna ACS patch.

I used this script to patch, compile, and install the kernel. I also threw in the AGESA patch while I was there. If you’re paranoid about using a script, it’s not too difficult to go see the exact commands it’s running to do it by hand instead.

I did also add make -j "$(nproc)" modules to line 366 under the same spacing as the line above it in part of my troubleshooting of the nvidia driver and left it in since the Module.symvers file wasn’t being created otherwise.

I did have to use the 5.3.X kernel instead of 5.4.X, as 5.4 broke my nvidia drivers and apt was not able to build any drivers against the 5.4.X headers. I was already using the 5.3.X kernel when I started this process so there’s a small chance it was using the same headers perhaps? I didn’t have to rebuild anything against the patched 5.3.X kernel, it ~just worked~.

After patching, added pcie_acs_override=downstream,multifunction to the the kernel cmdline. Just “downstream” should work in theory.

Not an ideal solution, but the only reason I was using the onboard USB in the first place is because I couldn’t pass through the PCIe card on my old system. Now everything is all hunky-dory until the next update.

c1742243 · December 23, 2019, 7:01am

As mentioned, the PCIe x1 controller in my vanilla Ubuntu 19.10 system (paid for with Australian Dollarydoos) does not appear in it’s own IOMMU group. I’d be interested to hear how you got your controller to work, and if kernel modifications were required.

Thanks for posting your experiences with this. Having rummaged through various threads discussing this issue it seems as though, currently, patching the kernel is the only solution.

As @2bitmarksman alluded to, the issue lies with the USB controller advertising function level reset (FLR) capabilities, but not actually handling reset requests gracefully (bringing down the host system in the process). A more detailed description can be found in this reddit thread

Patching solutions range from hard-coding the PCI_DEV_FLAGS_NO_FLR_RESET flag for the USB controller (1022:149c) and audio device (1022:1487), to dynamically applying the flags to PCI devices supplied as kernel parameters; the latter being my preferred choice as it provides more flexibility.

Not having looked to closely at the ACS patch, I can’t say how this approach differs. Or why you’d use one solution over the other. I suppose if I took the ACS route I could make use of the x1 PCIe controller as well. I’m still interested to hear how @iwouldntknow got this working.

In either case until AMD fix this issue in firmware/microcode (unlikely), or a patch for this specific issue is accepted into the kernel mainline (even less likely), fixing the problem is left as an exercise for end user.

2bitmarksman · December 23, 2019, 6:59pm

My USB controller that I passthrough is 1022:149c (VendorID:DeviceID format), so this is likely what the solution I’d recommend trying.

For reference, this is what my passthru.map file looks like in VMware for the USB controllers:

Summary

# USB 3.0 (1700)

1022 145c d3d0 false
1022 145a d3d0 false
1022 1456 d3d0 false

# USB 3.0 (3700x)

1022 149c d3d0 false

QXC · December 26, 2019, 3:39pm

I’d be annoyed about having to re-patch the kernel to change device IDs, but ACS doesn’t work around the original problem of not wanting to use a separate PCIe device for USB/etc functionality. I suppose the patch could be written in a fashion where you pass device IDs through at boot like how ACS override works. Not that I know how any of that works from a code standpoint.

I’m curious why these aren’t mainlined, but I’m not familiar at how the process of getting something like this into the kernel proper works at all. Not sure if it’s an issue of code quality/standards or if the core concept of the patches is kinda hacky and a better solution would be required (i.e. fix the microcode).

From a hardware standpoint, it’s good that there’s SOME way to make the internal controller work for those that absolutely do not or can not use a separate add in card. Thanks for digging up that reddit post.

Goertzenator · February 13, 2020, 11:49pm

@c1742243 , did you ever get this working?

I have taichi x470, 3900x, BIOS 3.90 and I get the same results as you.

c1742243 · February 15, 2020, 5:28pm

Welcome to the forums.

As alluded to earlier in the thread (and in the linked reddit discussion of the workaround patch) the issue only seems to affect the particular hardware combination of

3000 series Ryzen processors (Matisse / Zen 2);
on X470 motherboards (EDIT: or X570);
running a BIOS with AGESA 1.0.0.4B

So you’re definitely in the right place. I had naively hoped that ASRock, taking as long as they did to (FINALLY) release BIOS 3.90, had used some of that time to work on this particular issue. As you’ve found, that wasn’t the case.

As for getting it working: yes, I did! And it works perfectly. The top 4 (leftmost) ports on the motherboard are attached to 1022:149c and, when passed through to a Windows 10 guest, work flawlessly.

The caveat being I ended up having to patch and build the kernel myself.
Something I was hoping to avoid, if only from an upstream support perspective.

That said, the whole process has been quite the learning experience. While there are plenty of guides for patching/building the mainline kernel and using it in an Ubuntu system, finding current (let alone accurate) documentation for building a kernel as close to the officially supplied one (with niceties like ZFS) and doing things The Ubuntu Way™ was a little more difficult. I’m still refining the process I use, as it’s likely something I’ll do every time a new upstream kernel becomes available. If there’s enough interest, perhaps I could write it up for others to follow.

Ideally this sort of fix should be upstreamed somewhere so we can all benefit (“there are dozens of us, DOZENS!”). Preferably directly in the mainline, but Ubuntu would also be a good candidate. Perhaps someone with prior experience in such things (*cough* @gnif, @wendell *cough*) could provide some guidance.

2bitmarksman · February 16, 2020, 6:06pm

Oh dang, it’s good to know that BIOS 3.90 enables 4 ports. I’m using the Taichi Ultimate and BIOS 3.40 only has the top 2 enabled. May have to see if that carries over to the Ultimate version

c1742243 · February 17, 2020, 6:25am

For clarity, the ports that appear to be attached to the controller appear to be the pair beneath PS/2 port and the pair beneath the ethernet port in this picture

I should probably double check that all four actually get passed through; I don’t have 4 devices plugged in simultaneously and initial testing was done while fumbling around under a dark desk.

I never tested the no FLR patch on a BIOS other than 3.90, so I can’t say if it’s always been 4 ports on the Taichi non-Ultimate. I vaguely recall seeing a bash script posted somewhere on the forums that enumerated USB controllers and reported which devices were attached to them (to which Wendell replied he was going to steal it for the Pop!_OS guide), but I can’t seem to find it again.

As an aside, 3.90 appears to have introduced strange boot problems for several users. There’s a bit of discussion of it on reddit:

I’ve experienced the “won’t boot until physically unplugging the power supply” scenario once, immediately after updating from 3.78 to 3.90. Having experienced weird behaviour at least once pretty much every time I’ve upgraded the BIOS, I didn’t think too much of it at the time. That said, I never had this particular issue in earlier BIOS versions; even beta ones.

In my particular situation, the machine is powered on most of the time. So I wouldn’t really notice if there was an issue. Still, being able to trust that your system will boot consistently is pretty important. If you do upgrade, I’d be interested to hear if you witness any weird behaviour.