VFIO Woes - GTX 1070 Reset Bug?

Been Having problems getting my VM with VFIO passthrough working. Can launch the VM, but can’t get a video output over HDMI. Destroying the VM is the only way to stop it, and GPU doesn’t get reset.

Motherboard: Asus CROSSHAIR VI Hero (x370)
BIOS Version: 7201
CPU: Ryzen 7 1700
GPU (Host): Nvidia RTX 2070
GPU (Guest): Nvidia GTX 1070

GRUB Config

# GRUB boot loader configuration

GRUB_DEFAULT=0
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="Arch"
GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 kvm kvm_amd amd_iommu=on vfio-pci.ids=10de:1b81,10de:10f0"
GRUB_CMDLINE_LINUX="cryptkey=rootfs:*** cryptdevice=*** root=***"

vfio.conf in modprobe.d

options vfio-pci ids=10de:1b81,10de:10f0

mkinitcpio.conf

....
MODULES=(vfio_pci vfio vfio_iommu_type1 vfio_virqfd)
...
HOOKS=(base udev autodetect modconf block filesystems keyboard mdadm_udev encrypt lvm2 fsck)
...

IOMMU Groups

   IOMMU Group 0 00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 10 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 11 00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 59)
IOMMU Group 11 00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 12 00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 0 [1022:1460]
IOMMU Group 12 00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 1 [1022:1461]
IOMMU Group 12 00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 2 [1022:1462]
IOMMU Group 12 00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 3 [1022:1463]
IOMMU Group 12 00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 4 [1022:1464]
IOMMU Group 12 00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 5 [1022:1465]
IOMMU Group 12 00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 6 [1022:1466]
IOMMU Group 12 00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7 [1022:1467]
IOMMU Group 13 01:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset USB 3.1 xHCI Controller [1022:43b9] (rev 02)
IOMMU Group 13 01:00.1 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset SATA Controller [1022:43b5] (rev 02)
IOMMU Group 13 01:00.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] X370 Series Chipset PCIe Upstream Port [1022:43b0] (rev 02)
IOMMU Group 13 02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 02:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 02:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 02:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 02:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 02:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 02:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] 300 Series Chipset PCIe Port [1022:43b4] (rev 02)
IOMMU Group 13 03:00.0 USB controller [0c03]: ASMedia Technology Inc. ASM1143 USB 3.1 Host Controller [1b21:1343]
IOMMU Group 13 04:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
IOMMU Group 13 06:00.0 Multimedia controller [0480]: YUAN High-Tech Development Co., Ltd. Device [12ab:0380]
IOMMU Group 14 0a:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU106 [GeForce RTX 2070 Rev. A] [10de:1f07] (rev a1)
IOMMU Group 14 0a:00.1 Audio device [0403]: NVIDIA Corporation TU106 High Definition Audio Controller [10de:10f9] (rev a1)
IOMMU Group 14 0a:00.2 USB controller [0c03]: NVIDIA Corporation TU106 USB 3.1 Host Controller [10de:1ada] (rev a1)
IOMMU Group 14 0a:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C Port Policy Controller [10de:1adb] (rev a1)
IOMMU Group 15 0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070] [10de:1b81] (rev a1)
IOMMU Group 15 0b:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev a1)
IOMMU Group 16 0c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function [1022:145a]
IOMMU Group 17 0c:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Platform Security Processor [1022:1456]
IOMMU Group 18 0c:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) USB 3.0 Host Controller [1022:145c]
IOMMU Group 19 0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function [1022:1455]
IOMMU Group 1 00:01.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 20 0d:00.2 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU Group 21 0d:00.3 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) HD Audio Controller [1022:1457]
IOMMU Group 2 00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 3 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 4 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 5 00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge [1022:1453]
IOMMU Group 6 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 7 00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]
IOMMU Group 8 00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B [1022:1454]
IOMMU Group 9 00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge [1022:1452]

Target device is isolated to Group 15, so no conflicts there.

VM XML

<domain type="kvm">
  <name>win10</name>
  <uuid>0466daf5-6678-4f1d-8d97-9851c0d8d5ba</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">8388608</memory>
  <currentMemory unit="KiB">8388608</currentMemory>
  <vcpu placement="static">4</vcpu>
  <os>
    <type arch="x86_64" machine="pc-q35-4.2">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
    <bootmenu enable="yes"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vendor_id state="on" value="fknvidia"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
  </features>
  <cpu mode="custom" match="exact" check="partial">
    <model fallback="allow">EPYC</model>
    <topology sockets="1" cores="2" threads="2"/>
  </cpu>
  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/mnt/raid/VMs/win10.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <boot order="2"/>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/kaiju/Downloads/Win10_1909_English_x64.iso"/>
      <target dev="sdb" bus="sata"/>
      <readonly/>
      <boot order="1"/>
      <address type="drive" controller="0" bus="0" target="0" unit="1"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/kaiju/Downloads/virtio-win-0.1.171.iso"/>
      <target dev="sdc" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="2"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
    </controller>
    <controller type="sata" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x8"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x9"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0xa"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0xb"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x3"/>
    </controller>
    <controller type="pci" index="5" model="pcie-to-pci-bridge">
      <model name="pcie-pci-bridge"/>
      <address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="6" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="6" port="0xc"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x4"/>
    </controller>
    <controller type="pci" index="7" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="7" port="0xd"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x5"/>
    </controller>
    <interface type="network">
      <mac address="52:54:00:42:9f:d3"/>
      <source network="default"/>
      <model type="e1000e"/>
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </interface>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <hostdev mode="subsystem" type="usb" managed="yes">
      <source>
        <vendor id="0x1b1c"/>
        <product id="0x1b62"/>
      </source>
      <address type="usb" bus="0" port="1"/>
    </hostdev>
    <hostdev mode="subsystem" type="usb" managed="yes">
      <source>
        <vendor id="0x1532"/>
        <product id="0x0072"/>
      </source>
      <address type="usb" bus="0" port="2"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x0b" slot="0x00" function="0x0"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x05" slot="0x01" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x0b" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x05" slot="0x02" function="0x0"/>
    </hostdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="5"/>
    </redirdev>
    <redirdev bus="usb" type="spicevmc">
      <address type="usb" bus="0" port="6"/>
    </redirdev>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
    </memballoon>
  </devices>
</domain>

There is a polkit policy that allows users in the KVM group to manage libvirt.

/* Allow users in kvm group to manage the libvirt
daemon without authentication */
polkit.addRule(function(action, subject) {
    if (action.id == "org.libvirt.unix.manage" &&
        subject.isInGroup("kvm")) {
            return polkit.Result.YES;
    }
});

Running virt-manager as root, same issue. Also can’t shut down the VM, only destroy it. resulting in a reset bug.
2020-02-03-170853_707x130_scrot

uname -a

Linux owo 5.4.13-arch1-1 #1 SMP PREEMPT Fri, 17 Jan 2020 23:09:54 +0000 x86_64 GNU/Linux

Maybe an issue with the kernel? haven’t done this since the 4.xx kernels.

dmesg | grep -i -e IOMMU

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/vg-root rw cryptkey=rootfs:/root/secrets/crypto_keyfile.bin cryptdevice=UUID=ef2b3292-2411-46c5-a4f2-38c3c963b011:cryptlvm root=/dev/vg/root loglevel=3 kvm kvm_amd amd_iommu=on vfio-pci.ids=10de:1b81,10de:10f0
[    0.188883] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-linux root=/dev/mapper/vg-root rw cryptkey=rootfs:/root/secrets/crypto_keyfile.bin cryptdevice=UUID=ef2b3292-2411-46c5-a4f2-38c3c963b011:cryptlvm root=/dev/vg/root loglevel=3 kvm kvm_amd amd_iommu=on vfio-pci.ids=10de:1b81,10de:10f0
[    1.161130] iommu: Default domain type: Translated 
[    1.416121] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    1.416516] pci 0000:00:01.0: Adding to iommu group 0
[    1.416667] pci 0000:00:01.3: Adding to iommu group 1
[    1.416855] pci 0000:00:02.0: Adding to iommu group 2
[    1.417003] pci 0000:00:03.0: Adding to iommu group 3
[    1.417199] pci 0000:00:03.1: Adding to iommu group 4
[    1.417341] pci 0000:00:03.2: Adding to iommu group 5
[    1.417510] pci 0000:00:04.0: Adding to iommu group 6
[    1.417692] pci 0000:00:07.0: Adding to iommu group 7
[    1.417836] pci 0000:00:07.1: Adding to iommu group 8
[    1.418021] pci 0000:00:08.0: Adding to iommu group 9
[    1.418162] pci 0000:00:08.1: Adding to iommu group 10
[    1.418339] pci 0000:00:14.0: Adding to iommu group 11
[    1.418356] pci 0000:00:14.3: Adding to iommu group 11
[    1.418526] pci 0000:00:18.0: Adding to iommu group 12
[    1.418542] pci 0000:00:18.1: Adding to iommu group 12
[    1.418558] pci 0000:00:18.2: Adding to iommu group 12
[    1.418572] pci 0000:00:18.3: Adding to iommu group 12
[    1.418587] pci 0000:00:18.4: Adding to iommu group 12
[    1.418602] pci 0000:00:18.5: Adding to iommu group 12
[    1.418616] pci 0000:00:18.6: Adding to iommu group 12
[    1.418628] pci 0000:00:18.7: Adding to iommu group 12
[    1.418825] pci 0000:01:00.0: Adding to iommu group 13
[    1.418849] pci 0000:01:00.1: Adding to iommu group 13
[    1.418871] pci 0000:01:00.2: Adding to iommu group 13
[    1.418883] pci 0000:02:00.0: Adding to iommu group 13
[    1.418895] pci 0000:02:02.0: Adding to iommu group 13
[    1.418907] pci 0000:02:03.0: Adding to iommu group 13
[    1.418919] pci 0000:02:04.0: Adding to iommu group 13
[    1.418931] pci 0000:02:05.0: Adding to iommu group 13
[    1.418943] pci 0000:02:06.0: Adding to iommu group 13
[    1.418954] pci 0000:02:07.0: Adding to iommu group 13
[    1.418972] pci 0000:03:00.0: Adding to iommu group 13
[    1.418989] pci 0000:04:00.0: Adding to iommu group 13
[    1.419004] pci 0000:06:00.0: Adding to iommu group 13
[    1.419164] pci 0000:0a:00.0: Adding to iommu group 14
[    1.419195] pci 0000:0a:00.1: Adding to iommu group 14
[    1.419221] pci 0000:0a:00.2: Adding to iommu group 14
[    1.419248] pci 0000:0a:00.3: Adding to iommu group 14
[    1.419440] pci 0000:0b:00.0: Adding to iommu group 15
[    1.419464] pci 0000:0b:00.1: Adding to iommu group 15
[    1.419609] pci 0000:0c:00.0: Adding to iommu group 16
[    1.419799] pci 0000:0c:00.2: Adding to iommu group 17
[    1.419941] pci 0000:0c:00.3: Adding to iommu group 18
[    1.420132] pci 0000:0d:00.0: Adding to iommu group 19
[    1.420274] pci 0000:0d:00.2: Adding to iommu group 20
[    1.420459] pci 0000:0d:00.3: Adding to iommu group 21
[    1.420662] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    1.421829] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    1.469658] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel <[email protected]>

vfio-pci DMESG

[    1.591969] vfio-pci 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[    1.607262] vfio_pci: add [10de:1b81[ffffffff:ffffffff]] class 0x000000/00000000
[    1.623962] vfio_pci: add [10de:10f0[ffffffff:ffffffff]] class 0x000000/00000000
[  120.577235] vfio-pci 0000:0b:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=none
[  676.827091] vfio-pci 0000:0b:00.0: enabling device (0000 -> 0003)
[  676.827322] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x19@0x900
[  676.847114] vfio-pci 0000:0b:00.1: enabling device (0000 -> 0002)
[  678.080497] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  678.107162] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.631525] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.631780] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  679.652353] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.652607] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  679.670734] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.670989] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  679.688988] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.701300] vfio-pci 0000:0b:00.0: Invalid PCI ROM header signature: expecting 0xaa55, got 0xffff
[  679.701439] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.724384] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.724571] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.724607] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.724650] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  679.724794] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  679.724830] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  679.724958] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.724994] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  679.759573] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.759889] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  679.767983] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.769723] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.769761] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  679.770228] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  679.781770] vfio-pci 0000:0b:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  679.782157] vfio-pci 0000:0b:00.1: vfio_bar_restore: reset recovery - restoring BARs

lspci -nnk

to check the 1070 does not have any nvidia or neuveaueueue drivers in use?

(as in to make sure the vfio-pic.ids= has grabbed the card’s gpu and audio bit)

there’s only two IDs for the 1070: 10de:1b81,10de:10f0

both grabbed by vfio-pci.

0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1070] [10de:1b81] (rev ff)
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_drm, nvidia
0b:00.1 Audio device [0403]: NVIDIA Corporation GP104 High Definition Audio Controller [10de:10f0] (rev ff)
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel
1 Like

Cool, I just looked thru my notes, and it looks like you got all the steps right.

Can you drop the card for a bit, and run the VM with the spice console, see if windows event viewer has any errors logged?

I was attempting a fresh install, so i haven’t even gotten it to boot the installer yet.

1 Like

i’m recreating my VM tonight, preinstalling windows & cloning the disk before i add the host cards.

BdsDxe: loading Boot0004 "Windows Boot Manager" from HD(2,GPT,5AAF708D-96B0-413F-A0FB-06F7F9F27AC2,0x109000,0x31800)/\EFI\Microsoft\Boot\bootmgfw.efi
BdsDxe: starting Boot0004 "Windows Boot Manager" from HD(2,GPT,5AAF708D-96B0-413F-A0FB-06F7F9F27AC2,0x109000,0x31800)/\EFI\Microsoft\Boot\bootmgfw.efi

Does load with the GPU attached, but still no video out of HDMI. maybe dead port? gonna try DVI in a sec.

:thinking:

Another Edit:

Swapped around some nvidia GPUs, still same result with an rtx 2070.

[  146.335968] vfio-pci 0000:0c:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  146.339053] vfio-pci 0000:0c:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  146.339714] vfio-pci 0000:0c:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  146.339892] vfio-pci 0000:0c:00.0: vfio_bar_restore: reset recovery - restoring BARs
[  146.345811] vfio-pci 0000:0c:00.2: vfio_bar_restore: reset recovery - restoring BARs
[  146.346390] vfio-pci 0000:0c:00.3: vfio_bar_restore: reset recovery - restoring BARs
[  146.354423] vfio-pci 0000:0c:00.1: vfio_bar_restore: reset recovery - restoring BARs
[  146.355196] vfio-pci 0000:0c:00.0: vfio_bar_restore: reset recovery - restoring BARs

dmesg still showing reset recovery on the devices being passed through

[  136.118242] vfio-pci 0000:0c:00.0: timed out waiting for pending transaction; performing function level reset anyway    

:thinking: maybe im still forgetting something. gpu bios rom maybe?

oh, and

 !!! Unknown header type 7f

when the device fails to reset.

got an update on it, im fairly certain i have the BIOS update issue with ryzen boards.

I’ll need to either find or build a kernel with the following patch.

https://clbin.com/VCiYJ

source: arch wiki

1 Like

have you tried giving it rom file? you can dump one using gpu-z and edit it using hex editor to remove nvidia flash header. I did so with my 2070

pretty sure it ain’t that, plus i don’t have a physical partition of windows to even run gpu-z on. Nuked that the other week since it kept causing problems with GRUB

more so since i know it previously worked before i did a BIOS update, and had previously worked with both the 2070 & 1070.

2 things that I found / helped me get my dual guest and boot GPU pass through are giving it proper ROM file and setting ROM bar on. If I remove ROM bar my guests fail to start. Though I think it’s on by default. Removing ROM bar caused what you are experiencing.

I’m just gonna chalk it up to having a cursed machine (or nvidia GPUs). tried to dump the vbios off the 1070 via the method from Proxmox, and tried some different kernel flags. still nothing.

If i remember, the last time i even had VFIO passthrough working on this box was with a RX580 as the guest GPU. Maybe it’s a sign i should just stick to passing through AMD gpus.

Actually I just checked the Asus website and there is a much newer BIOS update available v 7704. I was having troubles with resetting guests where my gpus would blank screen on restart. Updated bios solved that issue. There is also info I found on matheus website that says bios version after November 2019 fixes AGESA problems that causes problems with passthrough. Try flashing latest bios for your mobo. Unless there is a reason for using old BIOS.

Edit: Here is the link to the info. Scroll down to the BIOS settings part where he highlights the problems with bios versions.

picked up a discount 5700 XT off newegg, one of MSI’s dreaded EVOKE OC cards. absolutely no issues getting that one passed through to the VM.

btw, i run arch and arch-based distros.

1 Like

@Kaiju I’m seeing so much inconsistent information on this topic. I wanted to check back - you had a reset bug issue with an NVIDIA card, and now you’re passing through 5700 XT and you haven’t experienced any reset bug - correct?
Thanks.

I eventually got it working tonight with the GTX 1070 (don’t think i can get it working with the 2070 yet) (gave up back in march after getting the 5700XT working.)

My particular motherboard was behind many bios revisions, and that included many AGESA updates and some fixes in regards to linux compatability.

Guess i had to dump and patch the 1070’s ROM as well. I presume i’ll need to do the same for the 2070, but im happy with just getting the 1070 working atm.

All that said, it ‘works’ but isn’t stable yet. just had a blue screen that flashed by quickly before restarting. Does seem to boot now without even using the patched ROM.

video_dxgkrnl_fatal_error (this may be due to corsair icue [absolutely icue, crashes when attempting to remove the driver]) Fixed.

Update on the 2070 setup: Works like a charm too.

2 Likes