[SOLVED] VFIO/Passthrough Troubles; Ubuntu18.04 -> Windows10 /w nVidia Cards

RESOLVED: Guest gpu kernel driver stuck on nvidia.

The following edits were made:

/etc/default/grub

-Removed “iommu=1”
+Added “vfio-pci.ids=10de:1b80,10de:10f0”

GRUB_DEFAULT=0
GRUB_TIMEOUT_STYLE=menu
GRUB_TIMEOUT=10
GRUB_DISTRIBUTOR=lsb_release -i -s 2> /dev/null || echo Debian
GRUB_CMDLINE_LINUX_DEFAULT=“amd_iommu=on vfio-pci.ids=10de:1b80,10de:10f0”
GRUB_CMDLINE_LINUX=""

/etc/modules

-Removed “vfio_pci ids=10de:1b80,10de:10f0”
+Added “vfio_pci” and “vfio_virqfd”

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd

I think the issue was that I was trying to point everything to the iommu hardware ID rather than declaring it once in the correct place and then having pcivfio module load to it before on boot.

sudo update-initramfs -u
sudo update-grub

reboot

1080 hung during the load process and the 710b took over as intended.

Verified with: lspci -k

0b:00.0 VGA compatible controller: NVIDIA Corporation GP104 [GeForce GTX 1080] (rev a1)
Subsystem: eVga.com. Corp. GP104 [GeForce GTX 1080]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
0b:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
Subsystem: eVga.com. Corp. GP104 High Definition Audio Controller
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel

I can now proceed with the VM install.

New Problem:

Having used this guide to configure the VM Guest:

…results in starting the virtual machine only for it to hang. CPU utilization drops to zero after a few seconds and the 1080 card just shows a black screen.

To see what was going on, I set up display VNC and Video VGA out to watch the VM boot.

When the virtual machine is started, the now single monitor i have connected to the 1080 flashes a moment, then goes black, then loses signal.

The VM graphical interface shows a TianoCore logo, then a script loading screen, where the machine then hangs.

VM Devices:

VM Boot:

VM Hangs after startup.nsh

This is as far as I have gotten after several hours of messing about with this. Has anyone encountered this pair of issues before?

  1. VM Does not output to GPU, but does seem to “grab” it.
  2. VM Hangs after TianoCore logo, will not register mouse release or progress to a boot device.

I am about out of ideas and patience for today. Thanks in advance for any assistance!

EDIT: Forgot to include contents of virt-manager.log

[Tue, 25 Dec 2018 02:36:22 virt-manager 5130] DEBUG (connection:788) domain lifecycle event: domain=Win10Guest event=0 reason=1
[Tue, 25 Dec 2018 02:36:38 virt-manager 5130] DEBUG (engine:1164) Starting vm ‘Win10Guest’
[Tue, 25 Dec 2018 02:36:39 virt-manager 5130] DEBUG (connection:701) There are 1 node devices with vendorId: 0x24f0, productId: 0x0140
[Tue, 25 Dec 2018 02:36:39 virt-manager 5130] DEBUG (connection:701) There are 1 node devices with vendorId: 0x046d, productId: 0xc52b
[Tue, 25 Dec 2018 02:36:39 virt-manager 5130] DEBUG (connection:847) node device lifecycle event: device=net_vnet0_fe_54_00_8a_58_11 event=0 reason=0
[Tue, 25 Dec 2018 02:36:39 virt-manager 5130] DEBUG (engine:1164) Starting vm ‘Win10Guest’
[Tue, 25 Dec 2018 02:36:39 virt-manager 5130] DEBUG (connection:701) There are 1 node devices with vendorId: 0x24f0, productId: 0x0140
[Tue, 25 Dec 2018 02:36:39 virt-manager 5130] DEBUG (connection:701) There are 1 node devices with vendorId: 0x046d, productId: 0xc52b
[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (connection:847) node device lifecycle event: device=usb_1_8_4_1_0 event=1 reason=0
[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (connection:847) node device lifecycle event: device=usb_1_8_4_1_1 event=1 reason=0
[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (connection:847) node device lifecycle event: device=usb_1_8_4 event=1 reason=0
[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (connection:1159) nodedev=usb_1_8_4_1_0 removed
[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (connection:1159) nodedev=usb_1_8_4_1_1 removed
[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (connection:1159) nodedev=usb_1_8_4 removed
[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (connection:788) domain lifecycle event: domain=Win10Guest event=4 reason=0
[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (connection:788) domain lifecycle event: domain=Win10Guest event=2 reason=0
[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (console:721) Starting connect process for proto=vnc trans= connhost=127.0.0.1 connuser= connport= gaddr=127.0.0.1 gport=5900 gtlsport=None gsocket=None
[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (error:99) error dialog message:
summary=Error starting domain: Requested operation is not valid: domain is already running
details=Traceback (most recent call last):
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 89, in cb_wrapper
callback(asyncjob, *args, **kwargs)
File “/usr/share/virt-manager/virtManager/asyncjob.py”, line 125, in tmpcb
callback(*args, **kwargs)
File “/usr/share/virt-manager/virtManager/libvirtobject.py”, line 82, in newfn
ret = fn(self, *args, **kwargs)
File “/usr/share/virt-manager/virtManager/domain.py”, line 1508, in startup
self._backend.create()
File “/usr/lib/python2.7/dist-packages/libvirt.py”, line 1062, in create
if ret == -1: raise libvirtError (‘virDomainCreate() failed’, dom=self)
libvirtError: Requested operation is not valid: domain is already running

[Tue, 25 Dec 2018 02:36:43 virt-manager 5130] DEBUG (console:844) Viewer connected
[Tue, 25 Dec 2018 02:36:44 virt-manager 5130] DEBUG (connection:847) node device lifecycle event: device=usb_1_8_4 event=0 reason=0
[Tue, 25 Dec 2018 02:36:44 virt-manager 5130] DEBUG (connection:847) node device lifecycle event: device=usb_1_8_4_1_0 event=0 reason=0
[Tue, 25 Dec 2018 02:36:44 virt-manager 5130] DEBUG (connection:847) node device lifecycle event: device=usb_1_8_4_1_1 event=0 reason=0
[Tue, 25 Dec 2018 02:37:20 virt-manager 5130] DEBUG (engine:1134) Destroying vm ‘Win10Guest’
[Tue, 25 Dec 2018 02:37:20 virt-manager 5130] DEBUG (console:835) Viewer disconnected
[Tue, 25 Dec 2018 02:37:21 virt-manager 5130] DEBUG (connection:788) domain lifecycle event: domain=Win10Guest event=5 reason=1
[Tue, 25 Dec 2018 02:37:21 virt-manager 5130] DEBUG (connection:847) node device lifecycle event: device=net_vnet0_fe_54_00_8a_58_11 event=1 reason=0
[Tue, 25 Dec 2018 02:37:22 virt-manager 5130] DEBUG (connection:1159) nodedev=net_vnet0_fe_54_00_8a_58_11 removed

Well, this is starting to turn into a blog chronicling this particular VM passthrough struggle rather than a cry for help.

Regarding the boot hang issue, it seems the problem was with the configuration of the KVM UEFI settings. Switching to Legacy BIOS fixed the hang issue, and Win10 was able to install successfully.

The virtual VGA and spice server were used to perform the win10 install.

Once running, windows10 was allowed to perform updates and remote desktop was enabled in the event that the gpu passthrough fails and further changes are needed within the guest OS.

Moving forward, the following changes were made to the guest XML configuration in order to avoid the “Error 43” code from the nvidia gpu.

sudo virsh edit

<domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>
...
...
  </devices>
  <qemu:commandline>
      <qemu:arg value='-cpu'/>
      <qemu:arg value='host,hv_time,kvm=off,hv_vendor_id=null'/>
  </qemu:commandline>
</domain>

Fingers crossed the iommu capture works as intended and this experiment can move to maintenance.

Two steps forward, one step back!

The nvidia drivers (368.25) install successfully and had given hope that the XML edits did their job. Shut down and removed all virtual gpu settings, rebooted.

No image output, but remote desktop via Remmina worked flawlessly. Network and other devices appear to be shared by default…will have to look into isolating devices from the VM.

CPU still shows only two cores despite being set to 6 cores/12 threads in the VM settings.

Checked device manager. Lo! And behold, I beseech thee!

Error Code 43!!

Still a victory. The 1080 card has been successfully isolated from the linux host and accessed by the windows guest. The lack of output has been explained by the card recognizing that it is running on a VM rather than some obscure mis-setting. And we have a functional VM.

Time to check up on some workarounds for the Code 43 error. If anyone knows of a trick or two I would like to see it!

Odd thing to note: the host I/O is being captured by the vm and released just fine, even though the guest has a dedicated keyboard and mouse, neither of which appear to be functioning on either the guest nor the host.

Fundamentals.

Following the advice from https://passthroughpo.st/apply-error-43-workaround/ in an attempt to remedy the Code 43 error.

Further edits to the guest XML file:

    <features>
    ...
        <vendor_id state='on' value='nvidia43_fix'/>
    </hyperv>
        <kvm>
            <hidden state='on'/>
        </kvm>

Run
sudo systemctl restart libvirtd
to ensure the changes take effect (this step was previously left off from the other edits above).

After a reboot, Code 43 remains.

See this tutorial and check the Nvidia specific part. You need to edit the xml configuration.

I personally don’t use virt-manager but a plain and simple bash script with the qemu command - much easier to debug and edit.

1 Like

Thanks for your reply!

I am actually following that guide for a number of the steps required. Unfortunately, even with the XML edits Code43 remains.

When you say qemu command, what are you referring to? I am only aware of using the virt-manager for this process.

I wrote this tutorial: https://heiko-sieger.info/running-windows-10-on-linux-using-kvm-with-vga-passthrough/

But you may have to redo your Windows installation. In any case, the script I show there needs adaptation so it matches your hardware and requirements.

2 Likes

@powerhouse I saw your tutorial during my research for this project; it is very well written and documented! I am glad to have it for reference, thank you.

I created a new VM following your tutorial and the result is the same. I am able to configure the VM and boot, but the card continues to show Code 43 in the device manager.

It is known-good hardware, and the only differences I have been using are that the VMs boot using legacy BIOS rather than UEFI. I cannot get the VMs to boot using UEFI at all, they just hang at 20ish% utilization.

The iommu changes appear to be working correctly. Shortly after boot the following is displayed on the windows vm monitor:

The text hangs until the VM is initialized and then goes blank, and turns off.

Code 43 appears on the windows guest after a fresh install, prior to any device drivers and with its connection to the internet disabled.

Tried every driver combination and/or cleaner on every forum I could find.

Using the current virt-manager, anyone know of a way to verify if the changes to the XML files are actually taking effect?

I still show the following issues inside the VM:

  1. No CPU cores shown, only “3.xxGhz” and “Virtual CPU”
  2. RedHat HDD
  3. Virtio devices in device manager.

If the point of the XML edits is to hide the hypervisor, and the XML file is being used, shouldn’t the CPU in task manager show the given number of logical processors? Shouldn’t the references to virtualized hardware reflect the name changes in the XML file?

Is there a way to verify that the XML file is actually being referenced?

I am about 2min away from just ordering as close an equivalent AMD gpu as I can find to the 1080 to get this to work but that seems like such a waste to get around a software issue.

I am thinking the use of the legacy BIOs rather than UEFI for the VM settings may be causing the trouble.

Problem is, I cant get the VM to boot from UEFI. Bah, I am flummuxed!

I would just order an AMD graphic card that has performance close to a Nvidia 1080. I have read if your graphics cards are from different manufacturers, Setting up GPU passthrough is easier.

I think your underlining problem is you are trying to set this up using legacy BIOS rather than UEFI. I think the Nvidia 710b doesn’t support a UEFI BIOS which is forcing you to use legacy BIOS instead of UEFI. While I have read that some people have been able to set up legacy BIOS instead of UEFI, I never heard of anyone being successful with such an old graphics card as the Nvidia 700 series.

In my humble opinion, (remember I have never attempted setting up GPU Passthrough) I think you have only two options to be complete your task successfully.

  1. Buy an AMD Radeon RX VEGA 64 (which should give you performance close to a Nvidia 1080) Keep or sell the Nvidia 710 which will give you a choice of which graphic card you want for the guest.

  2. Setup dual booting

I wish you all the luck on your success on whatever option you choose. :smiley:

Leaning close to that option.

Thing is, the 710b is running monitors for the host system, not the VM. When booting, just after the TianoCore logo there is a countdown for a boot script. Once it reaches zero the VM softlocks.

And regardless of the state of the two video cards, I am still encountering this issue with the CPU also not being passed correctly. Cant seem to narrow down the cause, maybe they are linked to the BIOS issue?

XML file set via virtual machine manager to use host cpu configuration, host-passthrough, 1 socket 6 cores 2 threads each. The task manager should be showing 12 logical cores, instead shows a single virtual cpu.

@apeBit I thought you had fixed your CPU problem, I don’t know if this problem is related to the BIOS issue. Maybe @wendell, @Eden, @GrayBoltWolf, or someone else could figure this out. As I pointed out in my last post I am not an advance in how computers or virtualization works to be able to figure this out. The only thing else I could suggest is to google the problems you are having and see if you can find an answer. Good Luck.

1 Like

The CPU config can be a bit finicky as certain configurations don’t want to work properly. the config I tend to use, that has worked the best for me with the most stable performance is 1 socket, 3 cores, 2 threads. I would attempt a few different configs. Make sure you have allocated all the cores as well.

When it comes to error code 43 the following always works for me, no matter if it is a fresh install or if I am setting up my old disk image on a fresh linux install.

  <qemu:commandline>
     <qemu:arg value='-cpu'/>
     <qemu:arg value='host,hv_time,kvm=off,hv_vendor_id=null'/>
  </qemu:commandline>

Also make sure that at the top of the xml edited with virsh command is replaced with <domain type='kvm' xmlns:qemu='http://libvirt.org/schemas/domain/qemu/1.0'>.

Those are the only edits I use to fix error 43 with Nvidia. I also use OVMF and Q35 chipset. I also remove all the virtual monitors just like the tutorial does. Never attempted an install where I had those installed but pretty sure it should not make a difference.

1 Like

Made a little video showing the Virt-Manager setup I use to get the VM up and running.

1 Like

@kriss120 Thank you for putting that video together, I appreciate you taking the time.

I am at work at the moment and so my ability to test is somewhat limited to remote. I will attempt to replicate your setup exactly when I get back to my desk.

Until then I have double-checked my settings for iommu groups, grub, etc…

To recap:

OS & Kernel:

Ubuntu 18.04; 4.18.19-041819-generic

CPU & Mobo:

AMD Ryzen 7 2700X Eight-Core Processor
ASUS ROG Crosshair VI Hero X370

AMD-VI:

[ 0.761880] AMD-Vi: IOMMU performance counters supported
[ 0.764946] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[ 0.764948] AMD-Vi: Extended features (0xf77ef22294ada):
[ 0.764953] AMD-Vi: Interrupt remapping enabled
[ 0.764954] AMD-Vi: virtual APIC enabled
[ 0.765058] AMD-Vi: Lazy IO/TLB flushing enabled

Guest GPU:

(PCI-E x16/x8 Slot 1)
0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)

0b:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)

IOMMU Group:

(Only device in Group 16)
IOMMU Group 16 0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
IOMMU Group 16 0b:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)

Grub:

GRUB_CMDLINE_LINUX_DEFAULT=“amd_iommu=on vfio-pci.ids=10de:1b80,10de:10f0”

/etc/initramfs-tools/modules:

softdep nvidia pre: vfio vfio_pci
vfio
vfio_iommu_type1
vfio_virqfd
options vfio_pci ids=10de:1b80,10de:10f0
vfio_pci ids=10de:1b80,10de:10f0
vfio_pci
nvidia

/etc/modules:

vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
#vfio_pci ids=10de:1b80,10de:10f0

/etc/modprobe.d/vfio.conf:

softdep nvidia pre: vfio vfio_pci
options vfio-pci ids=10de:1b80,10de:10f0

dmesg | grep -E “DMAR|IOMMU”"

[ 0.761880] AMD-Vi: IOMMU performance counters supported
[ 0.764946] AMD-Vi: Found IOMMU at 0000:00:00.2 cap 0x40
[ 0.765958] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 104.648187] vboxpci: IOMMU found

lspci -nnv | less:

0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX
1080] [10de:1b80] (rev a1) (prog-if 00 [VGA controller])
Subsystem: eVga. Corp. GP104 [GeForce GTX 1080] [3842:6288]
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

0b:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
Subsystem: eVga. Corp. GP104 High Definition Audio Controller [3842:6288]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel


At this point I am fairly confident that the iommu config and passthrough prep are correct. Further, when accessing a guest that has the 1080 card assigned to it, the card appears in windows device manager, just with error 43.

I am having to connect to the VM via remote desktop to see any of this, as the guest card does not output video to its dedicated monitor from the VM. When the host boots, the monitor shows modules loading, then hangs on:

“vfio-pci 0000.0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem”

Upon starting the VM this monitor goes black, then turns off with no signal.

Finally, I have tried various cpu configuration settings, none of them result in the desired “x logical cores” showing on the guest vm, but rather a single “virtual cpu”. The clock speed is fine, just missing the desired core/thread count.

Thank you for reading this far. For some reason this process is a bit of a struggle for me, trying to power through it.

Also, does this forum support spoilers or other ways to hide lengths of text unless selected?

I have a feeling it is not outputting video because you have the virtual VGA card active for the VM. It will be set as the default video card and be the one that initially outputs video from windows. It could be something else, but I don’t see any other reason this wouldn’t work. Hope the setup I showed in the video will help you get up and running.

1 Like

Alright. Setup exactly as in your video, minus the cpu and card differences. Guest card going to single dedicated monitor.

Firmware: UEFI x86_64: /OVMF/OVMF_CODE.d
Chipset: Q35

Start VM and it takes the second mouse and keyboard, no video output.

Check /var/log/libvirt/qemu/win10.log:

2018-12-27 01:27:12.960+0000: Domain id=8 is tainted: host-cpu
2018-12-27T01:27:15.575184Z qemu-system-x86_64: -device vfio-pci,host=0b:00.0,id=hostdev2,bus=pci.6,addr=0x0: Failed to mmap 0000:0b:00.0 BAR 3. Performance may be slow

Last two lines after the VM loads. Seems un happy with the CPU, and trouble with the gpu. Not sure what mmap or BAR3 are, but its a fine evening to learn. I feel like something fundamental is wrong somewhere.

Just curious if the virtual video would work to at least get an image I could work with I turned that on as well. Blank screen.

You probably already did, but never hurts to check.
Did you remember to update you initramfs after editing modules and modprobe?

PS.
Hope you manage to solve it, I have to go sleep.
Will look into this more tomorrow when I am off work.

I think the solution is here: Explaining CSM, efifb=off, and Setting the Boot GPU Manually.

Failed to mmap 0000:0b:00.0 BAR 3 is a sign that your graphics card uses a shadow BIOS. Try the remedies in the article linked above.

See also GPU Virtualization with KVM / QEMU under “Primary GPU Workaround”, as well as this post: romfile=/home/bender/gt1030.rom: VFIO 0000:01:00.0 BAR 3 mmap unsupported and the answer to it.

Another post that sheds light on the problem: https://bbs.archlinux.org/viewtopic.php?id=225481.

In essence, it’s a matter of your passthrough graphics card being initialized during system boot. There are several ways to prevent that, or you may need to load a romfile to supersede the shadow BIOS.

Hope this helps.

2 Likes