Working PCIe pasthough on Treadripper but

Hello All
Has any one passed through a USB/XHCI card?
I have sucsessfully pased through a GPU to a guest OS and I’ve tried to do the same steps to pass through a USB pcie add in card.
namely these are the steps outlined Here (ubuntu-17-04-vfio-pcie-passthrough) to ensure that vfio_pci can use the device in question.
then I use some Java to recover the PCIe bus.
However despite adding the device ID to
/etc/initramfs-tools/modules
/etc/modules

so they all include the list of devices I want to pass through namely like
vfio-pci ids=10de:1b06,10de:10ef,1b21:1242
and
/etc/modprobe.d/vfio_pci.conf
/etc/modprobe.d/vfio.conf
options vfio_pci ids=10de:1b06,10de:10ef,1b21:1242

and in /etc/modprobe.d/softdep.conf
softdep xhci_hcd pre: vfio-pci

my device is still

41:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242] (prog-if 30 [XHCI])
        Subsystem: ASMedia Technology Inc. ASM1142 USB 3.1 Host Controller [1b21:1242]
        Flags: bus master, fast devsel, latency 0, IRQ 33
        Memory at dd600000 (64-bit, non-prefetchable) [size=32K]
        Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
        Capabilities: [68] MSI-X: Enable+ Count=8 Masked-
        Capabilities: [78] Power Management version 3
        Capabilities: [80] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [200] Advanced Error Reporting
        Capabilities: [280] #19
        Capabilities: [300] Latency Tolerance Reporting
        Kernel driver in use: xhci_hcd

To get arround the PCIE reset issue I use some java to reset the bus. it enumerates the devices under
/sys/bus/pci/drivers/vfio-pci
and even when I extended it to enumerate the devices in
/sys/bus/pci/drivers/xhci_hcd
the device is reset but the guest wont start.
When I just try to pass through without recovering the PCIe bus then the guest starts but runs buggy and the host has to be rebooted, device removed post host reboot to run normaly.

Any one have any pointers of what else to do?

Many thanks

have you patched the kernel with the TR reinit patch yet? that’s probably your issue

I can certainly give that ago but I will need to learn how to do that first.

AFAIK it’s the fix for TR passthrough issues.

I think there are some fedora 3rd party repos with a prepatched kernel (and arch has an AUR Package) if you don’t want to compile yourself

That might not be necessary depending on what motherboard he has. All manufactures aside from ASUS have released bios updates that fix the reset issue without having to patch the kernel.

If you’ve got an Asus board, hopefully they put out an update soon considering TR2 is out next week…

the newest agesa is actually causing problems with the kvm modules loading on TR on some manufacturers

I’ve a Gigabyte X399 DESIGNARE EX and there are three newer bios revisions.
Any one tried the latest yet with KVM and can confirm that they play nice?

I’ve been investigating a bit today and if I
virsh nodedev-dettach pci_0000_41_00_0
I can then get the device to be in use by vfio-pci and by restarting the java I use I can see its monitoring three devices.
So I pass it through and see recovery messages as I start the guest.
However the guest crashes with a load of blurb I don’t understand on the KVM console.
So I’m just running with a GPU passed through. I’ll perhaps see if I can get it working by removing all references to the device and try and get it onto a pci-stub

I’m not too keen on going really custom on the kernel as I did that on my old system and had all sorts of issues, those were however due to finding a kernel that had the modules available for everything I was trying to add, which included some PCIe cards for fiber Chanel to other hosts, I had to stay a fair way from bleeding edge. I could never quite get it 100% stable and I still don’t know if it was the hosts CPU, RAM main board or the storage controllers overheating but my ZFS would drop out and hence the guests would go unresponsive.

For my Fedora with a stock Fedora kernel, I use pci-stub all the time. Fedora’s kernel variants all build in EHCI and XHCI into the kernel, so it loads before vfio-pci.

Good passthrough cards have supplemental power from SATA power and a Renesas uPD720202. I have no problems resetting that controller. Some people have reported more recent firmwares of the FL1100 4 port chipset work too.

well, the patch fixes your issue, and it won’d be upstreamed any time soon, so I’d recommend you go that route anyway.

Well gigabytes save the bios option works really well, no sarcasm! I backed up my current one when I went to flash the board.
When I restored it; it had my profile save and the last state was re-applied.
Same clock speeds, CAS and all the settings correct for virtualization.
YAY!

I know this works because:
New Bios was unstable at default clocks. some times loading the host OS some times hard locking when the OS was still loading.
New bios does not seem to present VT-d or what ever to the host OS so guest don’t start. KVM/Virt manager displays a message about the required hardware features not being present.
New bios seems to have introduced a glitch so the ZFS pools are not there when I log in. They loaded OK manually though and lsblk showed the disks are present. I could also see all 20 drives attached to the controller at the bios so really strange.
New Bios does not shutdown the system/OS when target shutdown reached or restart, Also the system takes ages to go down as some part of KVMs pre-requiset checks are hung.
Many of the options I had to set to get the system in a partially working state are not where they were and no new menus that they could have been moved to exist.

Perhaps its just time to buy a Xeon or a Mac.

Edit: Not one to just give up I tried again to get it working flashed it back to f10 and got to Ubuntu first go at default clocks.
Same shut down hang.
Checked BIOS for SVM/ IOMMU both enabled booted to ubuntu again
Tried ubuntu again got logged in OK. Again ZFS did not import pools automaticaly but while I was looking at what was going on all pools imported automatically.
So with the storage now available I tried KVM.
NOPE!

So back to f1.