Trouble passing though an RX 580 to an ubuntu desktop VM

I’m having some issues with GPU passthrough to a linux VM for desktop use. I’m running unraid 6.4.0_rc21b on a ryzen system (r7 1700X + Asus prime x370-pro), and my GPU is a RX 580 (this one: https://www.newegg.ca/Product/Product.aspx?Item=N82E16814202293). When I pass through the GPU to a windows 10 VM, I have almost no issues, except that the actual BIOS and boot screen (with the swirly dots) doesn’t show up, and I just get to the log in screen a moment later. I can reboot, shutdown, etc no problem.

When I pass through to a linux VM though (Ubuntu 17.10 for sake of documenting this issue) things are a little bit different. I first install the VM with a VNC screen for to confirm that the base system installed correctly, selecting to install updates and third-party drivers as I go, and everything installs fine. I then proceed to pass through my graphics card, along side some other PCI devices like audio and USB controllers. The first boot, everything works as well as it does with a windows VM. I install any updates available via apt update and upt dist-upgrade. When I go to reboot or shutdown though, something fails. Instead of shutting down, the VM gets stuck. If I run lspci looking for my graphics card at this point, before the VM has actually shut down (from what I can tell), all I find is this

0a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/580] (rev ff) (prog-if ff)
        !!! Unknown header type 7f
        Kernel driver in use: vfio-pci

Looking in /var/log/syslog, I see that it has been flooded with a somewhat randomly ordered set of messages like the following

Jan 11 18:32:43 core kernel: IOTLB_INV_TIMEOUT device=0a:00.0 address=0x00000007fb631070]
Jan 11 18:32:43 core kernel: AMD-Vi: Completion-Wait loop timed out

If I try to force stop the VM and then start it again I get the error

internal error: Unknown PCI header type '127'

The only way I can get the graphics card usable again is to reboot the entire system.

Now I’m no expert, but in doing some research this looks very much like the infamous PCI reset bug. It’s strange though that it only manifests with linux guests.

I’m curious if anyone out there has had similar issues? Anyone know a fix, or at least why windows works and linux doesn’t?

Wh- uhhhh, why not just run it straight up?

I can almost guarantee that’s a bus reset error. some 580’s have this issue. My MSI 580 Gaming X will pass through just fine.

That’s even more odd. I haven’t actually tried this with a Linux guest. I might have to do some testing when I get some time later. I think it’s because Windows more or less shuts down individual components and when you tell Linux to shut down it just tells the motherboard to switch off.