Debian GPU Passthrough Error [Solved]

Whilst attempting to pass through a Quadro P1000 to a Windows 10 guest, I get the following error:

Virtual Machine Manager output:

Unable to complete install: ‘internal error: process exited while connecting to monitor: 2018-11-07T18:48:50.768157Z qemu-system-x86_64: -device vfio-pci,host=2f:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio error: 0000:2f:00.0: failed to setup container for group 17: No available IOMMU models’

Traceback (most recent call last):
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 88, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File “/usr/share/virt-manager/virtManager/create.py”, line 2288, in _do_async_install
guest.start_install(meter=meter)
File “/usr/share/virt-manager/virtinst/guest.py”, line 461, in start_install
doboot, transient)
File “/usr/share/virt-manager/virtinst/guest.py”, line 396, in _create_guest
self.domain = self.conn.createXML(install_xml or final_xml, 0)
File “/usr/lib/python2.7/dist-packages/libvirt.py”, line 3523, in createXML
if ret is None:raise libvirtError(‘virDomainCreateXML() failed’, conn=self)
libvirtError: internal error: process exited while connecting to monitor: 2018-11-07T18:48:50.768157Z qemu-system-x86_64: -device vfio-pci,host=2f:00.0,id=hostdev0,bus=pci.0,addr=0x5: vfio error: 0000:2f:00.0: failed to setup container for group 17: No available IOMMU models

This error occurs during installation from an ISO. The VM uses OVMF and host-passthrough for the CPU configuration, has two PCI devices (GPU and its audio component) added, and all other display adapters removed (i.e. Display Spice, Video QXL).

Host specification:

  • Asrock X470 Taichi motherboard
  • AMD Ryzen 7 2700X 8 core 16 thread CPU
  • Crucial 16GB (2 x 8GB) DDR4 2666MHz ECC Unbuffered RAM
  • Samsung 970 EVO 500GB NVMe SSD
  • EVGA GeForce GTX1050Ti SC 4GB GPU
  • Leadtek Quadro P1000 4GB DDR5 GPU

This is on a clean Debian Stretch install. Regarding procedure, I have followed the Binding a GPU to vfio-pci in Debian recommendations. The only thing in my /etc/initramfs-tools/modules files is:

vfio_pci ids=10de:1cb1,10de:0fb9

The output from ls-iommu.sh shows that IOMMU groupings are present, so there is no issue with the host UEFI setup etc:

IOMMU Group 17 2f:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1cb1] (rev a1)
IOMMU Group 17 2f:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0fb9] (rev a1)

Likewise the output from lspci -nnk is:

2f:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1cb1] (rev a1)
	Subsystem: NVIDIA Corporation Device [10de:11bc]
	Kernel driver in use: vfio-pci
	Kernel modules: nouveau
2f:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:0fb9] (rev a1)
	Subsystem: NVIDIA Corporation Device [10de:11bc]
	Kernel driver in use: vfio-pci
	Kernel modules: snd_hda_intel

I note from the Ubuntu 17.04 – VFIO PCIe Passthrough & Kernel Update (4.14-rc1) post linked above that a lot more cruft is placed into /etc/initramfs-tools/modules etc, but I am reluctant to start trying things randomly without first understanding what is causing the current error. Can anyone shed some light on this for me please?

The GPU and GPU audio are the only things in group 17?

Is the GPU audio also using vfio-pci?

A couple of places recommended trying modprobe vfio_iommu_type1

1 Like

That is correct - the only thing in group 17 is the GPU and its audio component. Original post updated to show that the audio component is also using vfio-pci.

Edit:

Having just run the modprobe vfio_iommu_type1 command, I am happy to say that I now have output on the monitor connected to the Quadro GPU, so thank you very much for the suggestion.

Next step, find out why it drops into a UEFI interactive shell after booting rather than installing Windows. I think this might be the non-EFI image problem mentioned in the Arch Linux PCI passthrough via OVMF

You might already know this but to make the modprobe persistent, drop vfio_iommu_type1 in /etc/modules and update your initramfs.

I have had constant fits at times with trying to get it to boot to a Windows ISO. I have had the best luck with having boot menu disabled, the virtual CD as the first thing to boot from, and everything else but the HDD disabled.

Windows 10 does work with OVMF and is EFI compatible. I would recommend getting the latest ISO from MS’s windows 10 iso downloader here-
https://www.microsoft.com/en-us/software-download/windows10

I am beginning to see what you mean about the fits. Using a Windows 10 Enterprise image in /var/lib/libvirt/images/, I am able to boot the VM and get to see the Windows logo for a few seconds, but then it blues screens on me. Not helpful.

I have tried to install Ubuntu 18.04 using the same VM parameters as were used with the Windows VM just to see what happens, and too fails to install. The installer starts, and then I get the error:

The installer encountered an unrecoverable error. A desktop session will now be run so that you may investigate the problem or try installing again.

Again, not helpful. What is interesting about the later situation however is that once kicked to the desktop session, I am able to install Ubuntu without issue. Go figure.

Windows 10 version 1803 or later bluescreens in KVM unless you put options kvm ignore_msrs=1 in /etc/modprobe.d/kvm.conf

You beat me to it. I reverted to the 1709 Windows build and installation proceeded without issue. When I tried the exact same VM setup (including CPU configuration set to “host-passthrough”) using the 1803 build, I got the blue screen. Changing the CPU configuration to qemu64 resolved the issue however, and installation again proceeded without issue.

Making the change you suggested and rebooting allowed me to install the 1803 build using the original “host-passthrough” CPU configuration. Thanks again for your help.

Hello there, very interesting topic.

I arrived here because I thought I had a similar problem in my vfio setup.
I’m using a Dell 9560 with an attached TB3 Akitio Node having inside a recent Nvidia RTX 2060.

My lspci show everything connected:

06:00.0 PCI bridge [0604]: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] [8086:1576]
Kernel driver in use: pcieport
07:00.0 PCI bridge [0604]: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] [8086:1576]
Kernel driver in use: pcieport
07:01.0 PCI bridge [0604]: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] [8086:1576]
Kernel driver in use: pcieport
07:02.0 PCI bridge [0604]: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] [8086:1576]
Kernel driver in use: pcieport
08:00.0 System peripheral [0880]: Intel Corporation DSL6340 Thunderbolt 3 NHI [Alpine Ridge 2C 2015] [8086:1575]
Subsystem: Device [2222:1111]
Kernel driver in use: thunderbolt
Kernel modules: thunderbolt
09:00.0 PCI bridge [0604]: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] [8086:1576]
Kernel driver in use: pcieport
0a:01.0 PCI bridge [0604]: Intel Corporation DSL6340 Thunderbolt 3 Bridge [Alpine Ridge 2C 2015] [8086:1576]
Kernel driver in use: pcieport
0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:1f08] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:12fb]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
0b:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:10f9] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:12fb]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
0b:00.2 USB controller [0c03]: NVIDIA Corporation Device [10de:1ada] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:12fb]
Kernel driver in use: xhci_hcd
0b:00.3 Serial bus controller [0c80]: NVIDIA Corporation Device [10de:1adb] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:12fb]
Kernel driver in use: nvidia-gpu
Kernel modules: i2c_nvidia_gpu

Everything are on the same iommu group except for TB3 NHI which is on the previous group (which is correct far from what I see about any other setup).

The problem is in virt-manager, when I try to “Add hardware” then PCI device, my “0b:00.” does not appear on the PCI hardware list. But the TB3 node is displayed correctly.

The only reason for now I supposed is my graphic card is new, and needs at least WHQL 417.71 driver normally to have the RTX 2060 card details.

If someone with more experience can give an advice here?

I’m up if anything more needed.

Thanks anyway!

You could try editing the libvirt XML to manually add the host devices you want.

virsh edit NameOfVm, go down to the bottom of the devices section, then add in entries as needed. See this page for libvirt xml syntax-
https://libvirt.org/formatdomain.html#elementsHostDevSubsys

You also should use vfio-pci for the 2060 usb and serial controllers(0b:00.2 and 0b:00.3).