VFIO GPU passthrough: AMD Adrenalin update freezes the Windows VM

Hi everyone,
I have an issue with my current Windows 11 VM.
I am passing through a RX6800XT to the Windows 11 VM. The system works absolutely fine till I try to update the AMD Adrenalin software. The VM freezes and reboots after a few seconds.
I had the same issue for several times with Windows 10 and Windows 11 VMs. In the past my workaround was doing a backup immediately after setting up the VM. So I could just take the backup amd install the new Adrenalin software on the “fresh” system. As this is very time consuming I would prefer a better solution.
Is there anyone with the same problem or does anyone maybe know a solution for this problem?

Here is my actual VM config:

<domain type="kvm">
  <name>win11</name>
  <uuid>0c5be3d9-b6cc-46aa-9f98-a792b4072eb3</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/11"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">16777216</memory>
  <currentMemory unit="KiB">16777216</currentMemory>
  <vcpu placement="static">16</vcpu>
  <cputune>
    <vcpupin vcpu="0" cpuset="8"/>
    <vcpupin vcpu="1" cpuset="9"/>
    <vcpupin vcpu="2" cpuset="10"/>
    <vcpupin vcpu="3" cpuset="11"/>
    <vcpupin vcpu="4" cpuset="12"/>
    <vcpupin vcpu="5" cpuset="13"/>
    <vcpupin vcpu="6" cpuset="14"/>
    <vcpupin vcpu="7" cpuset="15"/>
    <vcpupin vcpu="8" cpuset="24"/>
    <vcpupin vcpu="9" cpuset="25"/>
    <vcpupin vcpu="10" cpuset="26"/>
    <vcpupin vcpu="11" cpuset="27"/>
    <vcpupin vcpu="12" cpuset="28"/>
    <vcpupin vcpu="13" cpuset="29"/>
    <vcpupin vcpu="14" cpuset="30"/>
    <vcpupin vcpu="15" cpuset="31"/>
  </cputune>
  <sysinfo type="smbios">
    <bios>
      <entry name="vendor">LENOVO</entry>
    </bios>
    <system>
      <entry name="manufacturer">Microsoft</entry>
      <entry name="product">Windows11</entry>
      <entry name="version">22H2</entry>
    </system>
    <baseBoard>
      <entry name="manufacturer">LENOVO</entry>
      <entry name="product">20BE0061MC</entry>
      <entry name="version">0B98401 Pro</entry>
      <entry name="serial">W1KS427111E</entry>
    </baseBoard>
    <chassis>
      <entry name="manufacturer">Dell Inc.</entry>
      <entry name="version">2.12</entry>
      <entry name="serial">65X0XF2</entry>
      <entry name="asset">40000101</entry>
      <entry name="sku">Type3Sku1</entry>
    </chassis>
    <oemStrings>
      <entry>myappname:some arbitrary data</entry>
      <entry>otherappname:more arbitrary data</entry>
    </oemStrings>
  </sysinfo>
  <os firmware="efi">
    <type arch="x86_64" machine="pc-q35-8.1">hvm</type>
    <firmware>
      <feature enabled="no" name="enrolled-keys"/>
      <feature enabled="yes" name="secure-boot"/>
    </firmware>
    <loader readonly="yes" secure="yes" type="pflash">/usr/share/edk2/x64/OVMF_CODE.secboot.4m.fd</loader>
    <nvram template="/usr/share/edk2/x64/OVMF_VARS.4m.fd">/var/lib/libvirt/qemu/nvram/win11_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode="custom">
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vpindex state="on"/>
      <runtime state="on"/>
      <synic state="on"/>
      <stimer state="on"/>
      <reset state="on"/>
      <vendor_id state="on" value="123456789123"/>
      <frequencies state="on"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
    <smm state="on"/>
    <ioapic driver="kvm"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="on">
    <topology sockets="1" dies="1" cores="8" threads="2"/>
    <cache level="3" mode="emulate"/>
    <feature policy="disable" name="hypervisor"/>
    <feature policy="require" name="svm"/>
    <feature policy="require" name="invtsc"/>
    <feature policy="require" name="topoext"/>
    <feature policy="disable" name="aes"/>
    <feature policy="disable" name="rdtscp"/>
  </cpu>
  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
    <timer name="pit" tickpolicy="discard"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/mark/Downloads/virtio-win-0.1.229.iso"/>
      <target dev="sdb" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="1"/>
    </disk>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/mnt/vmdisk_0/win11.qcow2"/>
      <target dev="sdc" bus="sata"/>
      <boot order="1"/>
      <address type="drive" controller="0" bus="0" target="0" unit="2"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/mark/Downloads/Win11_22H2_German_x64v2.iso"/>
      <target dev="sdd" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="3"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0x13"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
    </controller>
    <controller type="pci" index="5" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="5" port="0x14"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
    </controller>
    <controller type="pci" index="6" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="6" port="0x15"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/>
    </controller>
    <controller type="pci" index="7" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="7" port="0x16"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/>
    </controller>
    <controller type="pci" index="8" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="8" port="0x17"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/>
    </controller>
    <controller type="pci" index="9" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="9" port="0x8"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="10" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="10" port="0x9"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x1"/>
    </controller>
    <controller type="pci" index="11" model="pcie-to-pci-bridge">
      <model name="pcie-pci-bridge"/>
      <address type="pci" domain="0x0000" bus="0x09" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="12" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="12" port="0xa"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x2"/>
    </controller>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <tpm model="tpm-crb">
      <backend type="emulator" version="2.0"/>
    </tpm>
    <audio id="1" type="none"/>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x03" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x15" slot="0x00" function="0x3"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="usb" managed="yes">
      <source>
        <vendor id="0x13b1"/>
        <product id="0x0041"/>
      </source>
      <address type="usb" bus="0" port="1"/>
    </hostdev>
    <watchdog model="itco" action="reset"/>
    <memballoon model="none"/>
  </devices>
</domain>

Are you using the “online” installer? I’ve usually had to download the “offline” version and run that.

I tried both versions. I recently tried to update the driver directly in the device manager. It worked but I get an error notification.

As the driver installation works perfectly after setting up Windows maybe there is a Windows Update which disturbs the installation of the AMD software?
grafik

Any coincidental kernel messages?

Anything in windows event log?

Hmm, I think I’ve experienced this… I think. In my case my card was shown as disabled in device manager and I had to re-enable it in device manager and then run the driver update and it worked without issues.

Sadly there is nothing in the hosts kernel log.
The Windows event log just says that the system has been restarted because it has not been shut down correctly.

Hmm, that indicates it’s not (likely) a hardware issue, since there’d be a kernel error message if the GPU hard crased.

I’d recommend checking out what nondescriptmango suggested. Might be on to something. I’ve had some strange behavior with my AMD system a couple weeks ago, but installing offline with the virtio NIC disabled solved the problem for me.

Today I uninstalled the AMD driver with DDU.
I can not reinstall the driver. The installation starts to check if the system is compatible. Seconds later the Windows VM gets stuck. Another few seconds later the host system gets stuck too and the whole system reboots.
I will try to reinstall a new version of windows with the same config file. If the problem consists with the new system, there must be a problem with the host system. Otherwise the guest is the problem.

1 Like

Hi,
I have a very similar issue, maybe even the same. The issue persists for over a year now. Was working fine before.

First:
I’m using a RX6900XT for the guest, RX6400 for the host and a Ryzen 9 5950X on a X570 motherboard with host-passthrough.

If I try to update the AMD driver in my VM, the VM crashes. This happens in the beginning of the installation while the installer checks the system for compatibility. I discovered two different behaviors on the host, depending on the used VM config:

  1. Host crashes also and reboots: This happens if I HAVE NOT set <feature policy="disable" name="hypervisor"/>
  2. Host is fine, VM reboots: This happens if I HAVE set <feature policy="disable" name="hypervisor"/>

I could reproduced this behavior several times now.
Maybe that’s the reason, why you host is now rebooting too.

I also found a workaround, how to install a new driver:
If I change the cpu mode from host-passthrough to Hypervisor-Standard (I can’t look up the exact attribute value in the xml right now, so thats the value in the virt-manager ui), the update of the AMD driver succeeds! After that, I change the cpu mode back to host-passthrough.

Maybe you can confirm whether it works the same way for you?

Hi,
thanks for sharing the workaround. It works for me!
I am dealing with this “solution” for now, because I have no time to discover the problem more deeply.

Setting or not setting <feature policy="disable" name="hypervisor"/> results in the same manner for my system.