GPU Passthrough stuck on tianocore logo[solved] & I/O errors on VM-shutdown[unresolved]

I am creating this topic mostly to ask if anyone else has had this problem and if so, how they fixed it. There is simply very little documentation about this problem even existing so I thought that other users facing the same problem might find this useful.

Anyways, the issue I am currently facing is that with gpu passthrough with a VM, using OVMF and QEMU, the passed through GPU displays the tianocore logo (from the ovmf uefi firmware) but does not proceed further. Using a spice display adapter in Virt-manager I can see there is no problems in booting up, as I arrive at the login screen (windows 8.1, had the same problem with both linux and other windows versions so probably not related to that). In windows device manager, the device is recognized by the VM but displays an error 31 (The driver trying to start is not the same as the driver for the POSTed display adapter.) This problem has percisted on both my RX 480 that I am trying to pass though, and my GT 710, which I also tried as the RX 480 had the problem. I am running on a Gigabyte Aorus gaming 5 AM4 motherboard with a AMD Ryzen 1700. GPUs are MSI GT 710 for the host and Sapphire RX 480. The Host OS is Arch Linux (installed with Antergos installer). Tested with kernel 4.9, 4.10 and 4.11. Bios revision F6g(without ACS patch) and F5(with ACS patch) This problem had percisted over many installs also.

I don't think it has anything to do with the Ryzen platform as the few other cases I have found on the Web are from older and different cpus, both Amd and Intel. What none of the threads have are answers, so this may be a rare problem. I am hoping anyone has any idea on what could cause this.

Thanks in advance!

Edit: here's a picture of the VM booted and the GPU stuck

1 Like

Can't speak for Arch but I followed this guide on Debian: https://youtu.be/dsDUtzMkxFk
With mainline kernel (4.11.6 at the time: http://kernel.org ) With this ACS patch: https://aur.archlinux.org/cgit/aur.git/tree/add-acs-overrides.patch?h=linux-vfio (From linux-vfio from the AUR).

Hardware:
- R7 1700
- Asus Prime X370 Pro
- GTX 1050ti (Primary)
- GT 710 (Passed through)

Worked fine with Windows 10

I had a similar problem with the GPU being stuck while the VM still booted.

Here is a workaround I found that my or may not work for you:

1 - boot the VM with only a software virtual GPU and enable RDP
2 - reboot the VM with only the real PCI GPU passed through and no virtual GPU
3 - it should be stuck but it will still boot so RDP in from the host and reinstall the GPU driver
4 - reboot and hopefully everything works fine.

1 Like

Thanks! That seemed to work so now my gpu displayed video in the VM! Only problem I have now is that arch seems to crash every time the VM shuts down. Switching tty does not work either. Sigh... Well, atleast the GPU is working now :stuck_out_tongue_winking_eye:

When you reboot, is there a message in the journal that could help?

If you haven't already, edit /etc/systemd/journald.conf and change Storage=auto to Storage=persistent so you can keep logs of your previous boots.

Then you can access the previous boot by doing journalctl -xeb -1


A short poll of system config

This will help me figure out exactly what's going on with your system.

Kernel version and patches:

libvirt version:

qemu version:

output of the script below:

Do you have your 'vfio-pci' modules properly attached to the GT 710?

iommu groups script:

#!/bin/bash
shopt -s nullglob
for d in /sys/kernel/iommu_groups/*/devices/*; do 
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    printf 'IOMMU Group %s ' "$n"
    lspci -nns "${d##*/}"
done;
3 Likes

Thanks for the journald advice! Did not think of that. Seems to be the ata driver doing something janky when the VM is shut down. A temporary fix which works without crashing the host is simply connecting the HDD via usb(Guest is installed on seperate HDD).

Also the GPU seems to work fine now, re-installed the drivers in the guest and removed all other virtual gpus and it worked flawlessly.

An interesting note: Without passing through a physical sata controller to the VM, the Host crashed every time the Guest shut down displaying a bunch of I/O errors. When passing through a physical sata controller though, the Host is perfectly fine after shutdown of the Guest, but the guest suffers from heavy stuttering making it completely unusable.

More diagnosis is needed i guess :stuck_out_tongue:

Atleast I have a working VM now without many issues.

Also a sidenote: NAT "default" is inactive after a host shutdown and will not restart if the device is removed and re-added to the VM. Any ideas there and also why it doesn't just start it instead of throwing an error that it has not been started? :stuck_out_tongue:

Anyways, thanks for the help!

1 Like

libvirt and network manager are not friends. I've had nothing but problems with them in cahoots over the last year or so. I have no current solutions, but I'm digging into similar problems right now.

That's interesting. Are you passing the whole drive (/dev/sdx)? If so, my recommendation is to set it up with the virtio driver, like so:

Hope this helps.

Yes I am passing through directly /dev/sdb. Same problem with the virtIO driver. I am not sure what is causing this but my guesd is that it may be a bad sata driver on the host.

Some times the system just crashes and reboots. Other times I am able to move the mouse and interact with things for a little while, though it will not load anything that is not in RAM allready, and other times I get thrown to a black screen with errors about soft resetting ATA device. Both latter cases I have to hard reset the computer as a shutdown is not possible. I will look out for newer kernels and look for whether the issue gets resolved or if there is just something illconfigured in the configs (allthough unlikely as the arch installation is only 4 days old)

It is only an issue when connected through sata though and not when connected via usb.

One more thing is that even though the HDD was connected to the sata controller which was passed through to the VM, and the sata controller was loaded with the vfio driver but it still showed up in the host system and was mountable. Though like this the host didn't crash but the VM was basically unusably slow.

Side note: how do you quote text? I am fairly new to the forums :S

As I did above, all you do is highlight the text in the post and a little button shows up that says "Quote". Click that button and it opens a reply with the quoted text or places the preformatted quote at your cursor in the response box.

This thread should help you when it comes to using the forum features:


That would be my guess. Unless your OS is trying to mount the drive while it's in use.

That makes me believe it's most likely a bad sata driver.

Can you screenshot/copy-paste the errors and post them here? I'm really curious about the exact error so I can do some digging.

That sounds like the issue that Wendell has been talking about in the video he made recently discussing Ryzen passthrough.

3 Likes

I was not able to reproduce the errors but will try again more later.

I have gone through the process of making it glitch out and I have taken some pictures which may or may not be useful. It is at least clear to me that the host is failing to read and/or write any data to the disk.

Note: I am running Arch in LVM so that may or may not have something to do with the issues.

(Ignore the /dev/sdX value for the disk for the VM as it changes on reboot from how it's connected to the system)

Disk is visible and mountable even when connected to the sata controller which is supposed to be using the pci-vfio driver.

Looks like the system is unable to get the icons it has not yet put into RAM, indicating it probably cannot read files from the filesystem.

Bash spitting out I/O-errors when trying to execute scripts whom have not been loaded to RAM yet. (lsblk was ran before the VM was shutdown so it still works.)

A little while after that gnome shell seems to have had enough, and crashes/terminates

About five minutes after gnome terminates the tty outputs I/O errors on 'blk_update_request' on dev sda which holds the Linux Root file system.

After hard-resetting the computer, I had to run fsck manually on the Root filesystem.

I will try to reproduce the interesting errors more times and see if I can do that or if it will not do that any more.

You're passing through the secondary SATA controller, correct?

I've got a feeling that the system is initializing the controller with whatever the default driver is and once you get to a certain point in the boot process, it re-assigns it to vfio-pci. This is known to cause problems like the one you've got where you can see and mount a drive that's attached to a controller that's been passed through. I've encountered similar oddities with USB controllers.

Sounds like a udev rule or kernel command line is going to have to be used to get the drivers functioning happily.

@wendell, this might be interesting to you.

IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
IOMMU Group 10 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1460]
IOMMU Group 10 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1461]
IOMMU Group 10 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1462]
IOMMU Group 10 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1463]
IOMMU Group 10 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1464]
IOMMU Group 10 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1465]
IOMMU Group 10 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1466]
IOMMU Group 10 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1467]
IOMMU Group 11 03:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b9] (rev 02)
IOMMU Group 11 03:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b5] (rev 02)
IOMMU Group 11 03:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b0] (rev 02)
IOMMU Group 11 04:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 11 04:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 11 04:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 11 04:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 11 04:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 11 04:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 11 04:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
IOMMU Group 11 05:00.0 USB controller [0c03]: ASMedia Technology Inc. Device [1b21:1343]
IOMMU Group 11 06:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
IOMMU Group 11 07:00.0 Ethernet controller [0200]: Qualcomm Atheros Device [1969:e0b1] (rev 10)
IOMMU Group 11 09:00.0 Network controller [0280]: Broadcom Limited BCM4360 802.11ac Wireless Network Adapter [14e4:43a0] (rev 03)
IOMMU Group 12 0c:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK208 [GeForce GT 710B] [10de:128b] (rev a1)
IOMMU Group 12 0c:00.1 Audio device [0403]: NVIDIA Corporation GK208 HDMI/DP Audio Controller [10de:0e0f] (rev a1)
IOMMU Group 13 0d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] [1002:67df] (rev c7)
IOMMU Group 13 0d:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:aaf0]
IOMMU Group 1 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
IOMMU Group 2 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
IOMMU Group 3 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
IOMMU Group 4 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
IOMMU Group 5 00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
IOMMU Group 6 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
IOMMU Group 7 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
IOMMU Group 7 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1454]
IOMMU Group 7 11:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
IOMMU Group 7 11:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1456]
IOMMU Group 7 11:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:145c]
IOMMU Group 8 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
IOMMU Group 8 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1454]
IOMMU Group 8 12:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
IOMMU Group 8 12:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU Group 8 12:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Device [1022:1457]
IOMMU Group 9 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
IOMMU Group 9 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)

The IOMMU groups on my system. Passing through the sata controller(and the other devices) in group 8 (exept for the PCI bridge and the host bridge of course)

IOMMU Group 8 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
IOMMU Group 8 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1454]
IOMMU Group 8 12:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
IOMMU Group 8 12:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU Group 8 12:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Device [1022:1457]

I don't know how to vertify whether this is the secondary controller, though that is what would make the most sence by the fact that is is on bus 12 (in hexadecimal that is).

On further inspection though I might think It is actually reversed and this is the Primary Sata controller. Will do some more testing on this.

Edit: This might be the primary sata controller on the board as it houses the sata express ports, thoughits devices are from sata 4 and up while the other one (which seems to be an ASMedia controller) houses sata device 0-3. The linux host is running from what seems to be the ASMedia controller on sata0.

$ lspci -tv

-[0000:00]-+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1450
       +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1451
       +-01.0  Advanced Micro Devices, Inc. [AMD] Device 1452
       +-01.3-[03-0b]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 43b9
       |               +-00.1  Advanced Micro Devices, Inc. [AMD] Device 43b5
       |               \-00.2-[04-0b]--+-00.0-[05]----00.0  ASMedia Technology Inc. Device 1343
       |                               +-02.0-[06]----00.0  Intel Corporation I211 Gigabit Network Connection
       |                               +-03.0-[07]----00.0  Qualcomm Atheros Device e0b1
       |                               +-04.0-[08]--
       |                               +-05.0-[09]----00.0  Broadcom Limited BCM4360 802.11ac Wireless Network Adapter
       |                               +-06.0-[0a]--
       |                               \-07.0-[0b]--
       +-02.0  Advanced Micro Devices, Inc. [AMD] Device 1452
       +-03.0  Advanced Micro Devices, Inc. [AMD] Device 1452
       +-03.1-[0c]--+-00.0  NVIDIA Corporation GK208 [GeForce GT 710B]
       |            \-00.1  NVIDIA Corporation GK208 HDMI/DP Audio Controller
       +-03.2-[0d]--+-00.0  Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480]
       |            \-00.1  Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
       +-04.0  Advanced Micro Devices, Inc. [AMD] Device 1452
       +-07.0  Advanced Micro Devices, Inc. [AMD] Device 1452
       +-07.1-[11]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 145a
       |            +-00.2  Advanced Micro Devices, Inc. [AMD] Device 1456
       |            \-00.3  Advanced Micro Devices, Inc. [AMD] Device 145c
       +-08.0  Advanced Micro Devices, Inc. [AMD] Device 1452
       +-08.1-[12]--+-00.0  Advanced Micro Devices, Inc. [AMD] Device 1455
       |            +-00.2  Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
       |            \-00.3  Advanced Micro Devices, Inc. [AMD] Device 1457
       +-14.0  Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
       +-14.3  Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
       +-18.0  Advanced Micro Devices, Inc. [AMD] Device 1460
       +-18.1  Advanced Micro Devices, Inc. [AMD] Device 1461
       +-18.2  Advanced Micro Devices, Inc. [AMD] Device 1462
       +-18.3  Advanced Micro Devices, Inc. [AMD] Device 1463
       +-18.4  Advanced Micro Devices, Inc. [AMD] Device 1464
       +-18.5  Advanced Micro Devices, Inc. [AMD] Device 1465
       +-18.6  Advanced Micro Devices, Inc. [AMD] Device 1466
       \-18.7  Advanced Micro Devices, Inc. [AMD] Device 1467
1 Like

I'd agree. I think the OP may need a kernel command line parameter to tell it to ignore the passed through SATA controller.

As an aside, there are lots of potentially relevant fixes in kernel 4.11.9, which was just released.

Good work looking into that. I think you're on to something with the controllers being switched. Might be worth switching around which drives are connected to which physical ports and seeing if you can get the other controller passed through with the ACS override patch enabled.