[SOLVED] nVidia driver crash under Windows 10 KVM with GPU passthrough

I managed to finally get started with running Windows 10 under KVM, but after a while, seems to be mostly at random, it never got passed 1 hour uptime before nVidia driver crashes and I get a black screen. I can’t even RDP into the VM when it does this (I can before it crashes).

Event Viewer reports “Event 4101, Display”:

Display driver nvlddmkm stopped responding and has successfully recovered.
It never recovers though, it stays black. The next error I get is "Event 41, Kernel-Power" because I force shut it off from Virt-Manager. I get the Event 4101 every 4 seconds, I got like 10 to 20 errors logged in Event Viewer until I force off the VM. I disabled PCI-E power-saving, I disabled sleep and set display to never turn-off. I had the VM powered on 4 or 5 times and the same thing always happened.
PC specs
  • Pentium G4560 (2 threads passed to Windows)
  • 8 GB of RAM (4 GB for Windows)
  • nVidia GT1030 (passed to Windows)
  • Intel HD 610 (host GPU)
  • M.2 SSD where the host OS (Manjaro) is installed
  • SATA SSD where Windows is installed (also passed the whole drive to Windows VM)
  • PCI-E USB expansion card (passed to Windows)

I booted directly into Windows, did a clean install of 431.36 drivers WHQL, rebooted as a VM, same thing happened. Back on bare metal, then I turned off the internet, uninstalled this one, rebooted and installed the 431.36 DCH WHQL drivers (the new “UWP driver”). Rebooted (still on bare metal), everything seemed to work fine. Turned the internet back on, rebooted, then booted inside the VM. It did last a little longer, but after around 1h, the driver crashed again, same error (Event 4101).

Any ideas? I appreciate all help, I’m getting desperate after I managed to make the VM to work, only to encounter instability.

Edit: I forgot to mention that when I’m booted straight into Windows, the GPU works as expected and I don’t encounter any crashes or instability. It only happens when I boot Windows in KVM.

1 Like

Can you post a copy of your guest XML file?

Here's my vm xml config of /etc/libvirt/qemu/win10.xml

< domain type=‘kvm’>
< name>win10< /name>
< uuid>1362b5eb-efc0-4e53-8831-49c30f07e3f3< /uuid>
< title>Win10< /title>
< description>Win10< /description>
< metadata>
< libosinfo:libosinfo xmlns:libosinfo=“http://libosinfo.org/xmlns/libvirt/domain/1.0”>
< libosinfo:os id=“http://microsoft.com/win/10”/>
< /libosinfo:libosinfo>
< /metadata>
< memory unit=‘KiB’>4194304< /memory>
< currentMemory unit=‘KiB’>4194304< /currentMemory>
< vcpu placement=‘static’>3< /vcpu>
< os>
< type arch=‘x86_64’ machine=‘pc-q35-4.0’>hvm< /type>
< loader readonly=‘yes’ type=‘pflash’>/usr/share/ovmf/x64/OVMF_CODE.fd< /loader>
< nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd< /nvram>
< boot dev=‘hd’/>
< /os>
< features>
< acpi/>
< apic/>
< hyperv>
< relaxed state=‘on’/>
< vapic state=‘on’/>
< spinlocks state=‘on’ retries=‘8191’/>
< vendor_id state=‘on’ value=‘123456789ab’/>
< /hyperv>
< kvm>
< hidden state=‘on’/>
< /kvm>
< vmport state=‘off’/>
< /features>
< cpu mode=‘host-model’ check=‘partial’>
< model fallback=‘allow’/>
< /cpu>
< clock offset=‘localtime’>
< timer name=‘rtc’ tickpolicy=‘catchup’/>
< timer name=‘pit’ tickpolicy=‘delay’/>
< timer name=‘hpet’ present=‘no’/>
< timer name=‘hypervclock’ present=‘yes’/>
< /clock>
< on_poweroff>destroy< /on_poweroff>
< on_reboot>restart< /on_reboot>
< on_crash>destroy< /on_crash>
< pm>
< suspend-to-mem enabled=‘no’/>
< suspend-to-disk enabled=‘no’/>
< /pm>
< devices>
< emulator>/usr/bin/qemu-system-x86_64< /emulator>
< disk type=‘file’ device=‘cdrom’>
< driver name=‘qemu’ type=‘raw’/>
< target dev=‘sdb’ bus=‘sata’/>
< readonly/>
< address type=‘drive’ controller=‘0’ bus=‘0’ target=‘0’ unit=‘1’/>
< /disk>
< disk type=‘block’ device=‘disk’>
< driver name=‘qemu’ type=‘raw’/>
< source dev=’/dev/sda’/>
< target dev=‘vdb’ bus=‘sata’/>
< address type=‘drive’ controller=‘0’ bus=‘0’ target=‘0’ unit=‘0’/>
< /disk>
< controller type=‘usb’ index=‘0’ model=‘qemu-xhci’ ports=‘15’>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x02’ slot=‘0x00’ function=‘0x0’/>
< /controller>
< controller type=‘sata’ index=‘0’>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x1f’ function=‘0x2’/>
< /controller>
< controller type=‘pci’ index=‘0’ model=‘pcie-root’/>
< controller type=‘pci’ index=‘1’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘1’ port=‘0x10’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x02’ function=‘0x0’ multifunction=‘on’/>
< /controller>
< controller type=‘pci’ index=‘2’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘2’ port=‘0x11’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x02’ function=‘0x1’/>
< /controller>
< controller type=‘pci’ index=‘3’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘3’ port=‘0x12’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x02’ function=‘0x2’/>
< /controller>
< controller type=‘pci’ index=‘4’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘4’ port=‘0x13’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x02’ function=‘0x3’/>
< /controller>
< controller type=‘pci’ index=‘5’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘5’ port=‘0x8’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x01’ function=‘0x0’ multifunction=‘on’/>
< /controller>
< controller type=‘pci’ index=‘6’ model=‘pcie-root-port’>
< model name=‘pcie-root-port’/>
< target chassis=‘6’ port=‘0x9’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x00’ slot=‘0x01’ function=‘0x1’/>
< /controller>
< interface type=‘network’>
< mac address=‘52:54:00:11:6a:19’/>
< source network=‘default’/>
< model type=‘e1000e’/>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x01’ slot=‘0x00’ function=‘0x0’/>
< /interface>
< serial type=‘pty’>
< target type=‘isa-serial’ port=‘0’>
< model name=‘isa-serial’/>
< /target>
< /serial>
< console type=‘pty’>
< target type=‘serial’ port=‘0’/>
< /console>
< input type=‘tablet’ bus=‘usb’>
< address type=‘usb’ bus=‘0’ port=‘1’/>
< /input>
< input type=‘mouse’ bus=‘ps2’/>
< input type=‘keyboard’ bus=‘ps2’/>
< hostdev mode=‘subsystem’ type=‘pci’ managed=‘yes’>
< source>
< address domain=‘0x0000’ bus=‘0x01’ slot=‘0x00’ function=‘0x0’/>
< /source>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x04’ slot=‘0x00’ function=‘0x0’/>
< /hostdev>
< hostdev mode=‘subsystem’ type=‘pci’ managed=‘yes’>
< source>
< address domain=‘0x0000’ bus=‘0x01’ slot=‘0x00’ function=‘0x1’/>
< /source>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x05’ slot=‘0x00’ function=‘0x0’/>
< /hostdev>
< hostdev mode=‘subsystem’ type=‘pci’ managed=‘yes’>
< source>
< address domain=‘0x0000’ bus=‘0x03’ slot=‘0x00’ function=‘0x0’/>
< /source>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x06’ slot=‘0x00’ function=‘0x0’/>
< /hostdev>
< memballoon model=‘virtio’>
< address type=‘pci’ domain=‘0x0000’ bus=‘0x03’ slot=‘0x00’ function=‘0x0’/>
< /memballoon>
< /devices>
< /domain>

I already mentioned that I can boot and my GPU works fine for a while. :slightly_smiling_face:

I think I came across this problem recently. This was the fix for me, however I don’t know what version of qemu you’re on. I wasn’t getting code43, however drivers wouldn’t load and I was getting bsod’s and blackscreens.

QEMU 4.0: Unable to load graphics drivers/BSOD after driver install using Q35

Starting with QEMU 4.0 the q35 machine type changes the default kernel_irqchip from off to split which breaks some guest devices, such as nVidia graphics (the driver fails to load / black screen / code 43). Switch to full KVM mode instead with <ioapic driver='kvm'/> under libvirts <features> tag or kernel_irqchip=on in the -machine qemu arg.

XML should look like so:

  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <vendor_id state="on" value="whatever"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
    <ioapic driver="kvm"/>
  </features>
2 Likes

Thanks for the tip. Later today, I will try this and report back.

1 Like

Further, to the above. What does device manager in your guest show for your graphics card? Are the details and device id correct?

As far as I can remember (I’m not at home at the moment), it showed nVidia GT1030 correctly. I even played a game for half and hour before I got bored and it worked fine, however, the display driver later crashed.

I will do the XML change in around 4-5h, I’m very eager to tinker with it.

I tested it for a while, played a game a few hours, opened some browsers, plugged and unplugged USBs, everything seems to work fine. I’d like to test it for a few days, before I mark your answer as the solution. But this might have solved it, it never passed 1h before until it crashed - also, all activities seemed a little smoother.

2 Likes

Glad it’s working for you. Enjoy the sweet pleasure of dual stable OS’s.

2 Likes

Yep, I tested the solution, I had 1 day uptime. I’m still waiting for a second wireless keyboard to arrive. I am enjoying living in the future. Thank you, @exabits

Knew I would find what I needed to fix it. Was also pretty sure it would be from here! Thank you!

1 Like

So this hasn’t worked for me once. What do I do :frowning:

Strange, I do not have any crashes, although I installed the update from the driver updater, and not from the official site. I don’t know if this affected, but I downloaded the updater from this thinkmobiles.com/blog/best-free-driver-updater/ review and as I understand it, it determined and downloaded the exact version that we are talking about.

Sorry for reviving this topic but I wanted to leave the information here because this thread is one of the very few relevant search result for this issue. I hope it can save frustrations to someone else.

So I had a working pass through setup for a RTX 3090 but I tried every option or suggestion I could find here and on the internet for qemu-kvm, bios, etc… Also tried a lot of different Nvidia driver versions, and no matter what I did, I was getting the display driver error (code 4101) after running any graphical app for ~10-20min (and sometimes while idle). After the error occurring once, I could never fully reset the GPU without rebooting the host. Failure do do so would cause the same display driver crash/reset within minutes, until an eventual full crash of the Guest.

After searching a lot, I came upon a post about optimizations and a specific section about enabling MSI interrupts in the Windows guest for performance.

In my own Windows 10 VM, I noticed that this mode was already enabled for my GPU by default. But it lead me to read further on this topic and I found this explanation about the various interrupt modes:

I started messing with it, and it turns out, the solution for me was to force the mode to line-based interrupts on the GPU device. Since making the change I’ve been running hours long of GPU testing under 100% load without a single occurrence of the Nvidia driver crashing or error code 4101. I’ve also noticed no fps drop in the tests I’m running.

I would recommend being careful with this setting because it can make your VM non-bootable if set wrong. And there may be reasons to have certain devices functioning in MSI mode for latency (sound cards, etc). Also, the associated driver must support the requested mode. But If you have this issue, it’s worth checking out. In my case, the GPU has MSI disabled and all other devices which are capable, have it enabled.

You can edit register keys/values directly or use this tool to switch it on/off.
Search : MSI_util_v3.zip if links become broken

1 Like

For anyone who is suffering from this even after all of these options, I found removing the nvidia HDMI driver solved it for me.

qemu/kvm Nvidia event 4101 Display driver nvlddmkm stopped responding and has successfully recovered.

Edit: Welcome to the forum!

I was using HDMI on my GT 1030, so I’m not sure that may be a solution. The solution, as mentioned above was modifying the qemu-server XML file for the VM