VFIO / PCI Passthrough Saga, a tragedy in 3 parts

Hey all- first post on l1t. Happy to have found a place with a lot of interested vfio pci-passthrough fans.

My saga is one that will sound familiar I’m sure. I can’t get pci-passthrough to work to save my life. Well might be a bit of an exaggeration. It all started 4 years ago. No- seriously though I did first try this about 4 years ago with a Gigabyte board and AMD setup that was supposed to be one of the early ones that did it. No luck then.

Fast forward to now- sick of dual booting (again) because my spurts in Windows are always centered around gaming sessions that don’t work in Linux under any combination of WINE, etc.

I have:

OS: Ubuntu 19.04 beta
Kernel: Linux phantomvirt 5.0.0-8-generic #9-Ubuntu SMP Tue Mar 12 21:58:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
MB: ASUS Prime x470 PRO
CPU: Ryzen 2700x
GPU (host): XFX RX 560
GPU (guest): XFX RX 580

IOMMU Group isolation looks good, vfio pci is being loaded for the guest card and we’re all set. Fire up the VM and it immediately pauses itself and dmesg shows:

[  164.663625] vfio-pci 0000:09:00.0: enabling device (0000 -> 0003)
[  164.663988] vfio_ecap_init: 0000:09:00.0 hiding ecap 0x19@0x270
[  164.663997] vfio_ecap_init: 0000:09:00.0 hiding ecap 0x1b@0x2d0
[  164.664004] vfio_ecap_init: 0000:09:00.0 hiding ecap 0x1e@0x370
[  164.683602] vfio-pci 0000:09:00.1: enabling device (0000 -> 0002)
[  165.903711] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[  165.923867] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[  165.934728] pcieport 0000:00:03.2: AER: Uncorrected (Non-Fatal) error received: 0000:00:00.0
[  165.934734] pcieport 0000:00:03.2: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
[  165.934738] pcieport 0000:00:03.2:   device [1022:1453] error status/mask=00200000/04400000
[  165.934741] pcieport 0000:00:03.2:    [21] ACSViol                (First)
[  165.934791] pcieport 0000:00:03.2: AER: Device recovery successful
[  166.068160] AMD-Vi: Completion-Wait loop timed out
[  166.196535] AMD-Vi: Completion-Wait loop timed out
[  166.321823] AMD-Vi: Completion-Wait loop timed out
[  166.926331] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe197760]
[  166.937521] pcieport 0000:00:03.2: AER: Uncorrected (Non-Fatal) error received: 0000:00:00.0
[  166.937528] pcieport 0000:00:03.2: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
[  166.937532] pcieport 0000:00:03.2:   device [1022:1453] error status/mask=00200000/04400000
[  166.937535] pcieport 0000:00:03.2:    [21] ACSViol                (First)
[  166.937597] pcieport 0000:00:03.2: AER: Device recovery successful
[  167.928209] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe197790]
[  167.929276] pcieport 0000:00:03.2: AER: Uncorrected (Non-Fatal) error received: 0000:00:00.0
[  167.929282] pcieport 0000:00:03.2: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
[  167.929287] pcieport 0000:00:03.2:   device [1022:1453] error status/mask=00200000/04400000
[  167.929290] pcieport 0000:00:03.2:    [21] ACSViol                (First)
[  167.929355] pcieport 0000:00:03.2: AER: Device recovery successful
[  168.930102] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe1977c0]

So my question is: Is this one of those hopeless cards?

And follow-up questions. Is there a card on Amazon I can buy today (preferably AMD) that is KNOWN to work in the $200-$300 range?

I’ve tried swapping the cards so the RX 560 is for the guest and I get the exact same results as above.

I’ve also tried an EVGA Geforce 750 Ti, PNY Geforce 740, EVGA Geforce GT 630 for the guest without success. But- to be fair to those cards they at least let me install Windows but then don’t work for various reasons. I may circle back to those and try more configurations to see if I can get them to work. They all displayed issues related to no video card ROM and or D3 stuck.

I also tried a PNY GTX1050 Ti that I’ve since returned to Best Buy.

I’ve also tried a mixture of QEMU directly and virt-manager.

But my first question is my main one: should I give up on this AMD card for guests?

Thanks for reading!

Based on what boot parameter you are using, try to switch to iommu=pt or iommu=on.
Also what ports are your GPUs in?

I haven’t had any luck with passthrough on Ryzen either, using a Gigabyte x399 Designare, a 2950x and a Asus GTX 1080.

I guess the usual questions apply, have you tried a newer bios, or an older bios? Can you try Unraid or Proxmox? Those two seem to be “tuned” for success in passthrough, compared to a standard distro.

Try adding these values to your kernel command line.

pcie_acs_override=downstream and pci=noaer

1 Like

I missed that in the OP’s first post, ACSViol - so thats an iommu group problem right?

Is acs_override baked into the mainstream kernel now, or is it still a separate patch?

@futuretim did you try the other card, or shuffling around the slots?

IIRC this problem pops up when a AMD GPU is in the last slot connected to the chipset. It was so on x370 and seems to still be present on x470.
Try putting the GPUs in the CPU slots or the mentioned parameters iommu=on or iommu=pt.

1 Like

They aren’t as far as I know since I always include the ACS patch in my builds, would need to dig if that kernel parameter exists without the patch.

It could also have to do with the slot on the board, yes. I use a Vega 64 for my host so it’s in the top slot but I pass a 1080 Ti in the middle slot, granted this is on Z270, but the slot connections are usually different with a different amount of lanes, and coulld go through the PCH, have to check the mobo block diagram to confirm that.

1 Like

Thanks for the help. I’m trying more things as I am able this week but will definitely have more time for this weekend. I’m more determined (and defeated) than ever before to get this working.

The “pci=noaer” seemed to make ALL the errors in dmesg go away except “reset recovery - restoring bars” messages and some like “iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe197760]”

My current grub line is like so:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on iommu=pt vfio-pci.ids=1002:67df,1002:aaf0 pcie_acs_override=downstream pci=noaer"

Also- the attempt to install Windows and see if everything works takes so long. Can you verify my thinking with this:

I plan on getting an Ubuntu Desktop image setup that I can attach to various VM configurations and boot to see if I see the card in lspci. This would validate that it’s being passed through properly at least at a basic level right?

Still not working though. After adding pci=noaer I was able to boot and install Windows but then the radeon drivers did not find hardware.

And host GPU next to CPU and guest GPU in slot next to that (so I guess the middle slot).

It looks like I might just be destined to be dual-booting or until this is actually a “legitimate” end-user technology (like can I buy a card that advertises this as a capability).

This is with my RX580 as the card I am trying to passthrough. I added pci=nomsi to my grub options with no help.

I may try unraid a little later since that seems to be a popular solution.

This is literally the second motherboard and processor I’ve bought with the explicit goal of doing PCI passthrough / vfio. I may also try Ubuntu 18.04 with some known good kernels or something but I figured going with 5.0 (Ubuntu 19.04) would only yield better results.

At some point I just feel like I’m wasting a bunch of time for something that may or may not work and may only work in a flakey way. sigh … Oh well.

Anyway my latest attempt just yields this:

[   37.214659] vfio-pci 0000:09:00.0: enabling device (0000 -> 0003)
[   37.215010] vfio_ecap_init: 0000:09:00.0 hiding ecap 0x19@0x270
[   37.215019] vfio_ecap_init: 0000:09:00.0 hiding ecap 0x1b@0x2d0
[   37.215026] vfio_ecap_init: 0000:09:00.0 hiding ecap 0x1e@0x370
[   37.235054] vfio-pci 0000:09:00.1: enabling device (0000 -> 0002)
[   38.483620] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   38.483715] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   39.625795] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   39.626281] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   39.637687] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   39.638154] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   39.644505] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   39.644973] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   39.768943] AMD-Vi: Completion-Wait loop timed out
[   39.769333] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   39.773144] vfio-pci 0000:09:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[   39.773263] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   39.894727] AMD-Vi: Completion-Wait loop timed out
[   40.016538] AMD-Vi: Completion-Wait loop timed out
[   40.138075] AMD-Vi: Completion-Wait loop timed out
[   40.147364] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   40.151712] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   40.151769] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   40.151822] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   40.152071] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   40.152114] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   40.152427] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   40.152470] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   40.156205] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   40.156250] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   40.161748] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   40.161792] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   40.162344] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   40.162390] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   40.649540] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe1970e0]
[   41.651427] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe197110]
[   42.653314] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe197140]
[   43.655190] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe197170]
[   44.195684] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   44.195858] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   54.934912] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   54.934953] vfio_bar_restore: 0000:09:00.0 reset recovery - restoring bars
[   54.935237] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   54.935274] vfio_bar_restore: 0000:09:00.1 reset recovery - restoring bars
[   55.098763] AMD-Vi: Completion-Wait loop timed out
[   55.257811] AMD-Vi: Completion-Wait loop timed out
[   55.380436] AMD-Vi: Completion-Wait loop timed out
[   55.502130] AMD-Vi: Completion-Wait loop timed out
[   55.956714] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe1971a0]
[   56.078539] AMD-Vi: Completion-Wait loop timed out
[   56.200058] AMD-Vi: Completion-Wait loop timed out
[   56.321970] AMD-Vi: Completion-Wait loop timed out
[   56.443534] AMD-Vi: Completion-Wait loop timed out
[   56.564828] AMD-Vi: Completion-Wait loop timed out
[   56.686129] AMD-Vi: Completion-Wait loop timed out
[   56.807423] AMD-Vi: Completion-Wait loop timed out
[   56.928761] AMD-Vi: Completion-Wait loop timed out
[   56.958570] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe1971d0]
[   57.079923] AMD-Vi: Completion-Wait loop timed out
[   57.201659] AMD-Vi: Completion-Wait loop timed out
[   57.323493] AMD-Vi: Completion-Wait loop timed out
[   57.444911] AMD-Vi: Completion-Wait loop timed out
[   57.566255] AMD-Vi: Completion-Wait loop timed out
[   57.687565] AMD-Vi: Completion-Wait loop timed out
[   57.808841] AMD-Vi: Completion-Wait loop timed out
[   57.930186] AMD-Vi: Completion-Wait loop timed out
[   57.960450] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe197200]
[   58.091905] AMD-Vi: Completion-Wait loop timed out
[   58.213638] AMD-Vi: Completion-Wait loop timed out
[   58.334918] AMD-Vi: Completion-Wait loop timed out
[   58.456196] AMD-Vi: Completion-Wait loop timed out
[   58.577485] AMD-Vi: Completion-Wait loop timed out
[   58.698981] AMD-Vi: Completion-Wait loop timed out
[   58.962334] iommu ivhd0: Event logged [IOTLB_INV_TIMEOUT device=09:00.0 address=0xffe197230]
[   65.655068] kvm [2739]: vcpu1, guest rIP: 0xffffffff91273cf7 ignored rdmsr: 0x3a
[   65.655072] kvm [2739]: vcpu1, guest rIP: 0xffffffff91273cf7 ignored rdmsr: 0xd90
[   65.655082] kvm [2739]: vcpu1, guest rIP: 0xffffffff91273cf7 ignored rdmsr: 0x570
[   65.655083] kvm [2739]: vcpu1, guest rIP: 0xffffffff91273cf7 ignored rdmsr: 0x571
[   65.655085] kvm [2739]: vcpu1, guest rIP: 0xffffffff91273cf7 ignored rdmsr: 0x572
[   65.655086] kvm [2739]: vcpu1, guest rIP: 0xffffffff91273cf7 ignored rdmsr: 0x560
[   65.655087] kvm [2739]: vcpu1, guest rIP: 0xffffffff91273cf7 ignored rdmsr: 0x561
[   65.655089] kvm [2739]: vcpu1, guest rIP: 0xffffffff91273cf7 ignored rdmsr: 0x580
[   65.655090] kvm [2739]: vcpu1, guest rIP: 0xffffffff91273cf7 ignored rdmsr: 0x581
[   65.655091] kvm [2739]: vcpu1, guest rIP: 0xffffffff91273cf7 ignored rdmsr: 0x582

Here’s a fun twist for those following along at home. I’ve now seen PCI Passthrough/VFIO work in the wild; but it’s probably not what you think.

I just did it on my old hardware in the span of about 10 minutes. I believe I must have tried this hardware ages ago (around when people were just starting to do this) and had no success. Soooo mentally I must have written it off. There was also the matter of getting the new gear to go from 32GB RAM to 64GB RAM so as to run two desktops at once (or have the capability).

At any rate, I just passed through an XFX RX580 on a ASROCK Z97 Extreme6 Motherboard using an Intel i7-4770.

I should be excited… I mean I am, I’m just not sure exactly how I want to proceed. I wanted to use more AMD hardware just because of cost and because their discrete GPUs are far more open source and Linux friendly.

I guess I’ll setup a temporary Windows 10 install and see if I can get Looking Glass to work and work well. If I can I may have to send the other gear back and reconsider an Intel based platform.

This is Ubuntu 19.04 beta (5.0.0-11-generic kernel).

Cheers!

Why you thought Z97 and i7-4770 could not work? The i7-4770K would not as it did not have VT-d (Intel® Virtualization Technology for Directed I/O) , and some Z97 boards did not support it in BIOS.

You may be also doing something wrong with the AMD build. Can you give more info? Best would be to provide the whole dmesg, lspci -k and sudo virsh dumpxml <your-guest-name>.
And check your BIOS setting, and maybe update it. Make sure SVM Mode and anything IOMMU is enabled. Manual does not mention IOMMU anywhere.

Also pcie_acs_override=downstream will do nothing for you without custom kernel with ACS patches. And if you are not passing anything connected to chipset it is not needed.

My ancient Asus x79 Sabertooth with a xeon e5-2667 v2 runs vfio / passthrough A LOT better than my new Ryzen 2950x and x399 board. Every single slot on the x79 is in its own IOMMU group, along with all the on-board components, and no funny stuff like a “dummy pcie bridge” that has crippled my x399 board.

I thought I said why I thought it didn’t work… I tried it a LONG time ago when I first got it and I guess software support was a nightmare so it made it seem like I wasn’t going to be able to get a setup that would work. I realize at least theoretically the hardware always supported it (i.e. VT-d) but the software stack at that time was A LOT more immature.

Also- I’ve abandoned this build. I plan to start a thread shortly about where I ended up. I’m posting this from my Windows 10 install with a XFX 580 passed through on a Z390 Gigabyte Designare. I’m very pleased.

did not read everything, but do your GPUs actually have different IDs? I encountered funny issue, when i bought RX570 this week… it has same ID as RX580 :smiley: so i had to follow this guide:

VFIO in 2019 -- Pop!_OS How-To (General Guide though) [DRAFT] (at least the part with assigning vfio-pci driver to it)

1 Like