Disappearing VirtIO disk image

I’ll try a repair and see if that works. Next time I try this, I’m just going to use the SATA driver. How bad is the performance hit? Does IDE or SCSI make more sense?

This keeps getting weirder and weirder. So I recovered my Debian install, reinstalled Windows 1709 using SATA emulation (instead of VirtIO) and made a backup of the 1709 disk image.

Then I applied the 1803 update and things went pear-shaped. The Windows update was taking a while, so I stepped away for a few minutes. When I came back the update was still installing on the Windows VM on my 2nd monitor. But on my primary Linux monitor I saw this message flashing over and over:

AMD-Vi completion-wait loop timed out

Debian was completely unresponsive, but the Windows VM kept chugging away with the update. It rebooted a couple of times, and finally came back to the Windows desktop with 1803 fully installed. Linux remained completely unresponsive with the above AMD-Vi error. So I shut down the VM from the Windows Start menu and powered the whole mess off.

Once I restarted, I found that the ext4 root partition was still intact, unlike the last lockup. The whole system booted up, and I now have a fully operational Windows 1803 install. Except…I have no idea if this is going to happen again.

Are you on the latest UEFI?


I’ve seen whole-system lockups, but never a host-only lockup before. This is weird.

Something like this isn’t just a “fluke” so it’s likely to happen again. How often it’s happening, that’s a good question. You might want to start a ticket with AMD support and ask them if the AMD-Vi error is a hardware error or how to troubleshoot it.


At this point, it’s probably good for you to list all your specs. (including host distro, kernel version, libvirt version, etc…)

uname -r

4.9.0-6-amd64

ls /var/cache/apt/archives/libvirt*

libvirt0_3.0.0-4+deb9u3_amd64.deb
libvirt-clients_3.0.0-4+deb9u3_amd64.deb
libvirt-daemon_3.0.0-4+deb9u3_amd64.deb
libvirt-daemon-system_3.0.0-4+deb9u3_amd64.deb
libvirt-glib-1.0-0_1.0.0-1_amd64.deb

System: Dell Inspiron 5675
CPU: Ryzen 7 1700
RAM: Kingston DDR4-2400 32GB(16x2)
GPU1: Gigabyte GTX 1050 Ti 4GB (host)
GPU2: Dell OEM Radeon RX 580 8GB (Guest - appears to be reference model)
HD: Seagate 1TB 7200RPM

Other notes: KVM presents the Ryzen as an ‘Opteron G3’ to the guest. Not sure if that poses a problem or not. Sometimes when I install the Radeon graphics driver in the guest, X Windows in the host will crash.

Some interesting errors in syslog:

May 10 15:26:41 deb5675 kernel: [13997.400079] IOTLB_INV_TIMEOUT device=0a:00.0 address=0x000000081ab50f20]

May 10 15:31:27 deb5675 kernel: [14282.783539] kvm [12849]: vcpu0, guest rIP: 0xfffff807b20916c9 unhandled rdmsr: 0xc0010071

May 10 15:52:16 deb5675 kernel: [15531.950198] AMD-Vi: Event logged [
May 10 15:52:16 deb5675 kernel: [15531.950206] IO_PAGE_FAULT device=02:00.1 domain=0x000e address=0x00000000f0f48000 flags=0x0050]

The IOTLB error may be related to the Radeon reset bug, but I’ve previously been able to restart the VM after a shutdown.

UPDATE: Something weird just happened, similar to previous failed installs. Instead of booting to the TianoCore BIOS logo, I now just get a white screen accompanied in syslog by IO_PAGE_FAULT on device 02:00.1 and IOTLB_INV_TIMEOUT on 0a:00.0. And lspci gives me an Input/Output error.

Toss host-passthrough in the cpu model field. Unrelated, but you’ll get better performance this way.

That’s odd. Sounds like you’re not properly vfio’ing your guest GPU. Run this so I can see a bit more info about your setup:

lspci -knn 

I’ve got a 580 that resets just fine. They’re hit or miss.

Might want to update. You’re nearly 10 minor versions behind.

I’m attaching the output of lspci -knn - it’s rather long. Looks like vfio has grabbed the Radeon…
lspci.txt (6.2 KB)
After a host reboot, the white flash I got at boot in the VM is gone, the TianoCore logo is back and the VM is booting again. I’m really starting to suspect I have some kind of partial reset bug. Would testing with a second Nvidia card in the guest be useful? I’ve got a GTX 670 and 970 I can test with.

Unfortunately 4.9.0-6-amd64 is the latest that Debian stable is offering.

use ` backticks in the future to put it in a code block. Makes it easier for me to read:

00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1450]
	Subsystem: Dell Device [1028:07ee]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device [1022:1451]
	Subsystem: Dell Device [1028:07ee]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1454]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1454]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
	Subsystem: Dell FCH SMBus Controller [1028:07ee]
	Kernel driver in use: piix4_smbus
	Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
	Subsystem: Dell FCH LPC Bridge [1028:07ee]
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1460]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1461]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1462]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1463]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1464]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1465]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1467]
01:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
	Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [1028:07ee]
	Kernel driver in use: r8169
	Kernel modules: r8169
02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b9] (rev 02)
	Subsystem: ASMedia Technology Inc. Device [1b21:1142]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b5] (rev 02)
	Subsystem: ASMedia Technology Inc. Device [1b21:1062]
	Kernel driver in use: ahci
	Kernel modules: ahci
02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b0] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
05:00.0 Network controller [0280]: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter [168c:003e] (rev 32)
	Subsystem: Dell QCA6174 802.11ac Wireless Network Adapter [1028:0310]
	Kernel driver in use: ath10k_pci
	Kernel modules: ath10k_pci
09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
	Subsystem: Gigabyte Technology Co., Ltd GP107 [GeForce GTX 1050 Ti] [1458:3729]
	Kernel driver in use: nvidia
	Kernel modules: nvidia
09:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0fb9] (rev a1)
	Subsystem: Gigabyte Technology Co., Ltd Device [1458:3729]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
0a:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] [1002:67df] (rev c7)
	Subsystem: Dell Ellesmere [Radeon RX 470/480] [1028:1701]
	Kernel driver in use: vfio-pci
	Kernel modules: amdgpu
0a:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:aaf0]
	Subsystem: Dell Device [1028:aaf0]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
0b:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
0b:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1456]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1456]
	Kernel driver in use: ccp
	Kernel modules: ccp
0b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:145c]
	Subsystem: Dell Device [1028:07ee]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
0c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
0c:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
	Subsystem: Dell FCH SATA Controller [AHCI mode] [1028:07ee]
	Kernel driver in use: ahci
	Kernel modules: ahci
0c:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Device [1022:1457]
	Subsystem: Dell Device [1028:07ee]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

Regardless, it’s clear you’ve properly configured it. Hmm, this is puzzling.

host-passthrough caused the VM to enter a boot loop. Back to Opteron G3 for the time being.

If it is a reset bug, could the Radeon be triggering the entire PCIe bus to reset causing the host Nvidia card to reset as well? I’m not very familiar with the reset bug, so I don’t know if it would have that effect.

That’s interesting. Did it get past the BIOS initialization?

No. The AMD Reset bug simply does nothing when you try to reset it, well… it does something, but not everything we need it to.

Yes, but the OS rebooted only a few seconds after POST.

Okay. That’s odd. I’m usually able to change CPU model with no problem in KVM/W10

I’m starting to run out of places to look here. I’m not sure why we’ve got so many problems on the system.

Just for fun I’m going to try dropping an Nvidia card in the Radeon’s slot. Maybe it’s a long shot, but I’m not seeing many other options.

Reconfiguring hardware is a good idea if you’re having weird problems.

Wanted to post a quick update before I hit the sack. Swapped in a GTX 970 (host) and transferred the 1050 Ti to the guest. Interesting results.

No more IOTLB or AMD-Vi errors or white screens of death, but I am still seeing this one:

May 10 15:31:27 deb5675 kernel: [14282.783539] kvm [12849]: vcpu0, guest rIP: 0xfffff807b20916c9 unhandled rdmsr: 0xc0010071

As per Gray Wolf’s tutorial some extra sauce had to be added to the vm’s xml file to allow Nvidia cards to load without Code 43. For some reason, this also allowed the CPU to be recognized as a Ryzen 7 instead of an Opteron G3. Not really seeing a big jump in CPU performance though.

3dmark indicates that 1050 Ti has only half the power of the RX 580. However, in Assassin’s Creed Origins the 1050 TI was hitting 49 fps, whereas the RX 580 only got 41.

Enough for now. I’ll try upgrading to 1803 tomorrow, or maybe overnight.

Well the VM seems to work well enough, but I can’t convince it to upgrade to 1803. I keep getting error 0xc1900101 which seems to indicate a problem loading a driver or accessing the disk. Surprise, surprise.

This may not be an immediate problem - I have a disk image with 1803 installed, but it seems to point to an underlying problem accessing the disk image. I’ve currently got SATA selected instead of VirtIO in virt-manager, but the update keeps having problems accessing the disk. Any thoughts?

This gets stranger and stranger. If I try to boot the VM off the Windows 1803 ISO, the VM boot loops. It boots fine off a 1709 ISO. I’m totally befuddled at this point.

Maybe Virtualbox or Proxmox can do this better? OTOH, I may just move forward with 1709 and hope someone fixes compatibility with 1803 at some point.

Yeah, I’m having similar issues. I can’t tell you why 1803 is causing problems. It’s definitely a windows problem though.

I had to give up on it and use 1709.

That’s what I’ve done too. A Win10 Pro install will let you defer the update for a year. Someone should have it figured out by then.

At least I’m not alone…

Over the last day or two, I’ve encountered at least 3 other people on the forums who have had this issue. I think it was another thread where I mentioned that it’s most likely a QA/QC problem. Since they don’t have that anymore…

I’ve seen in other threads that this problem can be worked around by choosing an older CPU model to pass to the Windows guest (e.g., Core2Duo). However regardless of which CPU model I choose it always gets passed through as a Ryzen 7 1700.

I followed this tutorial to get GPU pass through working, but something in Gray Wolf’s process seems to enable CPU pass through and disable any ‘downgrades’.

It’s academic for the time being, since I can defer the upgrade for a year, but I would like to get this sorted out before my time runs out.