Disappearing VirtIO disk image

I had successfully set up a Win10 Pro VM with GPU passthrough. It booted up fine 5 or 6 times, then suddenly wouldn’t boot Windows. So I initialized the Windows recovery partition and went to a command prompt. The virtual drive was just gone - no sign of it in DISKPART. So I pondered for a bit, and switched the drive controller in virt-manager from VirtIO to SATA. The VM still wouldn’t boot to Windows, but the drive was finally visible from the recovery environment.

I then tried to “refresh”/reinstall Windows, but the disk access speed was so unbearably slow that I’ve given up on that for the moment.

So what the bleep happened? Is there any way to recover from this?

For the record, I’m on Debian Stretch x64 with KVM+virt-manager and the host environment seems perfectly fine.

Sounds like your system isn’t loading the virtio drivers.

Sounds plausible. But they were loading fine yesterday and the day before. Why did they decide to crap out today?

Only thing that comes to mind is that an automatic update nuked them. But then I’d think other VM users would have suffered the same fate.

If that’s the case, is there any way to reload the virtio drivers from the recovery environment? Windows won’t boot at all. I tried waiting 45 minutes after windows reinstalled via the emulated AHCI controller, but it still hadn’t booted by then.

I don’t know. I’m really not very good with windows at all. I only posted because I know about virtio and passthrough.

RE: the reinstall, have you tried reinstalling it while using the virtio driver?

I was trying to reinstall from the Windows recovery environment, which has apparently lost track of the virtio driver. I have not tried reinstalling from the WIndows ISO and reloading the driver from the virtio iso. I can do that, I’m just afraid that Windows will mysteriously lose track of virtio again.

I’m not sure. I’ve never seen something like this happen before, I’m assuming it’s just Windows riding the short bus.

It was a Windows update. Apparently Windows version 1803 clobbers the vfio storage driver. I had a new VM setup and running perfectly fine until I clicked on Check for Updates in Windows. Once 1803 was installed, I got a Please wait screen for many minutes. Then I came to a login screen and logged in. After logging in I got “Preparing windows” screen for many more minutes. Finally a blank black desktop loaded with a message that WIndows couldn’t find my Desktop folder.

After rebooting from this fiasco, Windows launched startup repair and failed. Once again command prompt in the recovery environment showed no drives attached. However this time I could load the viostor.inf driver with “drvload viostor.inf”. Ran DISKPART and presto, the drive was visible.

But I’m really not sure how to recover from here. I’ll probably just reinstall. Since it’s Win 10 Pro I think I can defer upgrades for a few months, but eventually I’ll have to upgrade.

Has no one else run into this?

2 Likes

Awesome!

Thanks for updating us.

I haven’t been using passthrough for a while, so I don’t think I would have run into the issue. I’m slowly getting back into it though. :smiley:

Thanks for the information. I’ll have to keep this in mind. My Windows 10 VM updated to 1803 the other day, and seemed to reboot and install the update fine. Although, I am pretty sure I’m using SATA now instead of VirtIO. So apparently it doesn’t mess with that.

You’re welcome. One thing that may have complicated the issue was that I had two disk images attached to the VM, one for the OS and one for data. When I removed the data drive, Windows booted up but reported it was on the prior version (1709 I think.) I then tried to reapply the update…and the same thing happened. In addition, the VM locked up the host OS and corrupted the file system.

Sigh. Not having much luck here. I may try GPU passthrough with VirtualBox, though it seems few have tried it.

1 Like

That doesn’t make sense. That shouldn’t have happened.

I can’t explain it either. The last time the VM shutdown, X windows crashed and dumped me to the lightdm login screen. However, it was completely locked up, nor could I get to a console. Since I didn’t have openssh set up either, I couldn’t login and had to kill power. When I rebooted I got a message that inode xxxxxx was corrupted. I may try to boot with a gparted image and repair with fsck, but I’ll probably just nuke it and start over.

Hmm, maybe next time enable the SSH server.

I’m not sure what caused that. :confused:

I’ll try a repair and see if that works. Next time I try this, I’m just going to use the SATA driver. How bad is the performance hit? Does IDE or SCSI make more sense?

This keeps getting weirder and weirder. So I recovered my Debian install, reinstalled Windows 1709 using SATA emulation (instead of VirtIO) and made a backup of the 1709 disk image.

Then I applied the 1803 update and things went pear-shaped. The Windows update was taking a while, so I stepped away for a few minutes. When I came back the update was still installing on the Windows VM on my 2nd monitor. But on my primary Linux monitor I saw this message flashing over and over:

AMD-Vi completion-wait loop timed out

Debian was completely unresponsive, but the Windows VM kept chugging away with the update. It rebooted a couple of times, and finally came back to the Windows desktop with 1803 fully installed. Linux remained completely unresponsive with the above AMD-Vi error. So I shut down the VM from the Windows Start menu and powered the whole mess off.

Once I restarted, I found that the ext4 root partition was still intact, unlike the last lockup. The whole system booted up, and I now have a fully operational Windows 1803 install. Except…I have no idea if this is going to happen again.

Are you on the latest UEFI?


I’ve seen whole-system lockups, but never a host-only lockup before. This is weird.

Something like this isn’t just a “fluke” so it’s likely to happen again. How often it’s happening, that’s a good question. You might want to start a ticket with AMD support and ask them if the AMD-Vi error is a hardware error or how to troubleshoot it.


At this point, it’s probably good for you to list all your specs. (including host distro, kernel version, libvirt version, etc…)

uname -r

4.9.0-6-amd64

ls /var/cache/apt/archives/libvirt*

libvirt0_3.0.0-4+deb9u3_amd64.deb
libvirt-clients_3.0.0-4+deb9u3_amd64.deb
libvirt-daemon_3.0.0-4+deb9u3_amd64.deb
libvirt-daemon-system_3.0.0-4+deb9u3_amd64.deb
libvirt-glib-1.0-0_1.0.0-1_amd64.deb

System: Dell Inspiron 5675
CPU: Ryzen 7 1700
RAM: Kingston DDR4-2400 32GB(16x2)
GPU1: Gigabyte GTX 1050 Ti 4GB (host)
GPU2: Dell OEM Radeon RX 580 8GB (Guest - appears to be reference model)
HD: Seagate 1TB 7200RPM

Other notes: KVM presents the Ryzen as an ‘Opteron G3’ to the guest. Not sure if that poses a problem or not. Sometimes when I install the Radeon graphics driver in the guest, X Windows in the host will crash.

Some interesting errors in syslog:

May 10 15:26:41 deb5675 kernel: [13997.400079] IOTLB_INV_TIMEOUT device=0a:00.0 address=0x000000081ab50f20]

May 10 15:31:27 deb5675 kernel: [14282.783539] kvm [12849]: vcpu0, guest rIP: 0xfffff807b20916c9 unhandled rdmsr: 0xc0010071

May 10 15:52:16 deb5675 kernel: [15531.950198] AMD-Vi: Event logged [
May 10 15:52:16 deb5675 kernel: [15531.950206] IO_PAGE_FAULT device=02:00.1 domain=0x000e address=0x00000000f0f48000 flags=0x0050]

The IOTLB error may be related to the Radeon reset bug, but I’ve previously been able to restart the VM after a shutdown.

UPDATE: Something weird just happened, similar to previous failed installs. Instead of booting to the TianoCore BIOS logo, I now just get a white screen accompanied in syslog by IO_PAGE_FAULT on device 02:00.1 and IOTLB_INV_TIMEOUT on 0a:00.0. And lspci gives me an Input/Output error.

Toss host-passthrough in the cpu model field. Unrelated, but you’ll get better performance this way.

That’s odd. Sounds like you’re not properly vfio’ing your guest GPU. Run this so I can see a bit more info about your setup:

lspci -knn 

I’ve got a 580 that resets just fine. They’re hit or miss.

Might want to update. You’re nearly 10 minor versions behind.

I’m attaching the output of lspci -knn - it’s rather long. Looks like vfio has grabbed the Radeon…
lspci.txt (6.2 KB)
After a host reboot, the white flash I got at boot in the VM is gone, the TianoCore logo is back and the VM is booting again. I’m really starting to suspect I have some kind of partial reset bug. Would testing with a second Nvidia card in the guest be useful? I’ve got a GTX 670 and 970 I can test with.

Unfortunately 4.9.0-6-amd64 is the latest that Debian stable is offering.

use ` backticks in the future to put it in a code block. Makes it easier for me to read:

00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1450]
	Subsystem: Dell Device [1028:07ee]
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Device [1022:1451]
	Subsystem: Dell Device [1028:07ee]
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1453]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1454]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1452]
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:1454]
	Kernel driver in use: pcieport
	Kernel modules: shpchp
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
	Subsystem: Dell FCH SMBus Controller [1028:07ee]
	Kernel driver in use: piix4_smbus
	Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
	Subsystem: Dell FCH LPC Bridge [1028:07ee]
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1460]
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1461]
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1462]
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1463]
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1464]
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1465]
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1466]
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:1467]
01:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 15)
	Subsystem: Dell RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [1028:07ee]
	Kernel driver in use: r8169
	Kernel modules: r8169
02:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b9] (rev 02)
	Subsystem: ASMedia Technology Inc. Device [1b21:1142]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
02:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b5] (rev 02)
	Subsystem: ASMedia Technology Inc. Device [1b21:1062]
	Kernel driver in use: ahci
	Kernel modules: ahci
02:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b0] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
03:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43b4] (rev 02)
	Kernel driver in use: pcieport
	Kernel modules: shpchp
05:00.0 Network controller [0280]: Qualcomm Atheros QCA6174 802.11ac Wireless Network Adapter [168c:003e] (rev 32)
	Subsystem: Dell QCA6174 802.11ac Wireless Network Adapter [1028:0310]
	Kernel driver in use: ath10k_pci
	Kernel modules: ath10k_pci
09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
	Subsystem: Gigabyte Technology Co., Ltd GP107 [GeForce GTX 1050 Ti] [1458:3729]
	Kernel driver in use: nvidia
	Kernel modules: nvidia
09:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0fb9] (rev a1)
	Subsystem: Gigabyte Technology Co., Ltd Device [1458:3729]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
0a:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] [1002:67df] (rev c7)
	Subsystem: Dell Ellesmere [Radeon RX 470/480] [1028:1701]
	Kernel driver in use: vfio-pci
	Kernel modules: amdgpu
0a:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:aaf0]
	Subsystem: Dell Device [1028:aaf0]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel
0b:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:145a]
0b:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Device [1022:1456]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1456]
	Kernel driver in use: ccp
	Kernel modules: ccp
0b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:145c]
	Subsystem: Dell Device [1028:07ee]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
0c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Device [1022:1455]
0c:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
	Subsystem: Dell FCH SATA Controller [AHCI mode] [1028:07ee]
	Kernel driver in use: ahci
	Kernel modules: ahci
0c:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Device [1022:1457]
	Subsystem: Dell Device [1028:07ee]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

Regardless, it’s clear you’ve properly configured it. Hmm, this is puzzling.