Return to Level1Techs.com

nVidia driver crash under Windows 10 KVM with GPU passthrough

I managed to finally get started with running Windows 10 under KVM, but after a while, seems to be mostly at random, it never got passed 1 hour uptime before nVidia driver crashes and I get a black screen. I can’t even RDP into the VM when it does this (I can before it crashes).

Event Viewer reports “Event 4101, Display”:

Display driver nvlddmkm stopped responding and has successfully recovered.
It never recovers though, it stays black. The next error I get is "Event 41, Kernel-Power" because I force shut it off from Virt-Manager. I get the Event 4101 every 4 seconds, I got like 10 to 20 errors logged in Event Viewer until I force off the VM. I disabled PCI-E power-saving, I disabled sleep and set display to never turn-off. I had the VM powered on 4 or 5 times and the same thing always happened.
PC specs
  • Pentium G4560 (2 threads passed to Windows)
  • 8 GB of RAM (4 GB for Windows)
  • nVidia GT1030 (passed to Windows)
  • Intel HD 610 (host GPU)
  • M.2 SSD where the host OS (Manjaro) is installed
  • SATA SSD where Windows is installed (also passed the whole drive to Windows VM)
  • PCI-E USB expansion card (passed to Windows)

I booted directly into Windows, did a clean install of 431.36 drivers WHQL, rebooted as a VM, same thing happened. Back on bare metal, then I turned off the internet, uninstalled this one, rebooted and installed the 431.36 DCH WHQL drivers (the new “UWP driver”). Rebooted (still on bare metal), everything seemed to work fine. Turned the internet back on, rebooted, then booted inside the VM. It did last a little longer, but after around 1h, the driver crashed again, same error (Event 4101).

Any ideas? I appreciate all help, I’m getting desperate after I managed to make the VM to work, only to encounter instability.

Edit: I forgot to mention that when I’m booted straight into Windows, the GPU works as expected and I don’t encounter any crashes or instability. It only happens when I boot Windows in KVM.

Can you post a copy of your guest XML file?

Here's my vm xml config of /etc/libvirt/qemu/win10.xml

< domain type=‘kvm’>
< name>win10< /name>
< uuid>1362b5eb-efc0-4e53-8831-49c30f07e3f3< /uuid>
< title>Win10< /title>
< description>Win10< /description>
< metadata>
< libosinfo:libosinfo xmlns:libosinfo=“http://libosinfo.org/xmlns/libvirt/domain/1.0”>
< libosinfo:os id=“http://microsoft.com/win/10”/>
< /libosinfo:libosinfo>
< /metadata>
< memory unit=‘KiB’>4194304< /memory>
< currentMemory unit=‘KiB’>4194304< /currentMemory>
< vcpu placement=‘static’>3< /vcpu>
< os>
< type arch=‘x86_64’ machine=‘pc-q35-4.0’>hvm< /type>
< loader readonly=‘yes’ type=‘pflash’>/usr/share/ovmf/x64/OVMF_CODE.fd< /loader>
< nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd< /nvram>
< boot dev=‘hd’/>
< /os>
< features>
< acpi/>
< apic/>
< hyperv>
< relaxed state=‘on’/>
< vapic state=‘on’/>
< spinlocks state=‘on’ retries=‘8191’/>
< vendor_id state=‘on’ value=‘123456789ab’/>
< /hyperv>
< kvm>
< hidden state=‘on’/>
< /kvm>
< vmport state=‘off’/>
< /features>
< cpu mode=‘host-model’ check=‘partial’>
< model fallback=‘allow’/>
< /cpu>
< clock offset=‘localtime’>
< timer name=‘rtc’ tickpolicy=‘catchup’/>
< timer name=‘pit’ tickpolicy=‘delay’/>
< timer name=‘hpet’ present=‘no’/>
< timer name=‘hypervclock’ present=‘yes’/>
< /clock>
< on_poweroff>destroy< /on_poweroff>
< on_reboot>restart< /on_reboot>
< on_crash>destroy< /on_crash>
< pm>
< suspend-to-mem enabled=‘no’/>
< suspend-to-disk enabled=‘no’/>
< /pm>
< devices>
< emulator>/usr/bin/qemu-system-x86_64< /emulator>
< disk type=‘file’ device=‘cdrom’>
< driver name=‘qemu’ type=‘raw’/>
< target dev=‘sdb’ bus=‘sata’/>
< readonly/>
< address type=‘drive’ controller=‘0’ bus=‘0’ target=‘0’ unit=‘1’/>
< /disk>
< disk type=‘block’ device=‘disk’>
< driver name=‘qemu’ type=‘raw’/>
< source dev=’/dev/sda’/>
< target dev=‘vdb’ bus=‘sata’/>
< address type=‘drive’ controller=‘0’ bus=‘0’ target=‘0’ unit=‘0’/>
< /disk>
< controller type=‘usb’ index=‘0’ model=‘qemu-xhci’ ports=‘15’>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x02’ slot=‘0x00’ function=‘0x0’/>
< /controller>
< controller type=‘sata’ index=‘0’>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x1f’ function=‘0x2’/>
< /controller>
< controller type=‘pci’ index=‘0’ model=‘pcie-root’/>
< controller type=‘pci’ index=‘1’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘1’ port=‘0x10’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x02’ function=‘0x0’ multifunction=‘on’/>
< /controller>
< controller type=‘pci’ index=‘2’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘2’ port=‘0x11’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x02’ function=‘0x1’/>
< /controller>
< controller type=‘pci’ index=‘3’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘3’ port=‘0x12’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x02’ function=‘0x2’/>
< /controller>
< controller type=‘pci’ index=‘4’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘4’ port=‘0x13’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x02’ function=‘0x3’/>
< /controller>
< controller type=‘pci’ index=‘5’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘5’ port=‘0x8’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x01’ function=‘0x0’ multifunction=‘on’/>
< /controller>
< controller type=‘pci’ index=‘6’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘6’ port=‘0x9’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x01’ function=‘0x1’/>
< /controller>
< interface type=‘network’>
< mac address=‘52:54:00:11:6a:19’/>
< source network=‘default’/>
< model type=‘e1000e’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x01’ slot=‘0x00’ function=‘0x0’/>
< /interface>
< serial type=‘pty’>
< target type=‘isa-serial’ port=‘0’>
< model name=‘isa-serial’/>
< /target>
< /serial>
< console type=‘pty’>
< target type=‘serial’ port=‘0’/>
< /console>
< input type=‘tablet’ bus=‘usb’>
< address type=‘usb’ bus=‘0’ port=‘1’/>
< /input>
< input type=‘mouse’ bus=‘ps2’/>
< input type=‘keyboard’ bus=‘ps2’/>
< hostdev mode=‘subsystem’ type=‘pci’ managed=‘yes’>
< source>
< address domain=‘0x0000’ bus=‘0x01’ slot=‘0x00’ function=‘0x0’/>
< /source>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x04’ slot=‘0x00’ function=‘0x0’/>
< /hostdev>
< hostdev mode=‘subsystem’ type=‘pci’ managed=‘yes’>
< source>
< address domain=‘0x0000’ bus=‘0x01’ slot=‘0x00’ function=‘0x1’/>
< /source>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x05’ slot=‘0x00’ function=‘0x0’/>
< /hostdev>
< hostdev mode=‘subsystem’ type=‘pci’ managed=‘yes’>
< source>
< address domain=‘0x0000’ bus=‘0x03’ slot=‘0x00’ function=‘0x0’/>
< /source>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x06’ slot=‘0x00’ function=‘0x0’/>
< /hostdev>
< memballoon model=‘virtio’>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x03’ slot=‘0x00’ function=‘0x0’/>
< /memballoon>
< /devices>
< /domain>

I already mentioned that I can boot and my GPU works fine for a while. :slightly_smiling_face:

I think I came across this problem recently. This was the fix for me, however I don’t know what version of qemu you’re on. I wasn’t getting code43, however drivers wouldn’t load and I was getting bsod’s and blackscreens.

QEMU 4.0: Unable to load graphics drivers/BSOD after driver install using Q35

Starting with QEMU 4.0 the q35 machine type changes the default kernel_irqchip from off to split which breaks some guest devices, such as nVidia graphics (the driver fails to load / black screen / code 43). Switch to full KVM mode instead with <ioapic driver='kvm'/> under libvirts <features> tag or kernel_irqchip=on in the -machine qemu arg.

XML should look like so:

  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <vendor_id state="on" value="whatever"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
    <ioapic driver="kvm"/>
  </features>
2 Likes

Thanks for the tip. Later today, I will try this and report back.

1 Like

Further, to the above. What does device manager in your guest show for your graphics card? Are the details and device id correct?

As far as I can remember (I’m not at home at the moment), it showed nVidia GT1030 correctly. I even played a game for half and hour before I got bored and it worked fine, however, the display driver later crashed.

I will do the XML change in around 4-5h, I’m very eager to tinker with it.

I tested it for a while, played a game a few hours, opened some browsers, plugged and unplugged USBs, everything seems to work fine. I’d like to test it for a few days, before I mark your answer as the solution. But this might have solved it, it never passed 1h before until it crashed - also, all activities seemed a little smoother.

2 Likes

Glad it’s working for you. Enjoy the sweet pleasure of dual stable OS’s.

2 Likes

Yep, I tested the solution, I had 1 day uptime. I’m still waiting for a second wireless keyboard to arrive. I am enjoying living in the future. Thank you, @exabits

Knew I would find what I needed to fix it. Was also pretty sure it would be from here! Thank you!

1 Like