Asus RX5700 (reference design) GPU pass-through displays frozen boot logo image - FIXED :D

Hello,

I managed to create a VM guest with GPU pass-through following the incredible resources available on the Level1Techs forum. However, i’m kinda hitting a wall and i would like to receive some inputs and help.

When i start my VM guest and connect a monitor to my GPU pass-through, i’m getting this image on screen:

If i use virt-manager viewer however, i’m getting something and as you can see the drivers for my card are installed properly:

My setup:
Gigabyte Aorus X570 Master
AMD Ryzen R9 3950X
Asrock RX5700 (reference design) for my host machine
Asus RX5700 (reference design) for my VM guest

BIOS
CSM disable, UEFI boot enable

Host machine OS
GNU/Linux Xubuntu 20.04.1

Software
bridge-utils is already the newest version (1.6-2ubuntu1).
ovmf is already the newest version (0~20191122.bd85bf54-2ubuntu3).
libvirt-clients is already the newest version (6.0.0-0ubuntu8.5).
libvirt-daemon-system is already the newest version (6.0.0-0ubuntu8.5).
qemu-kvm is already the newest version (1:4.2-3ubuntu6.8).
qemu-utils is already the newest version (1:4.2-3ubuntu6.8).
virt-manager is already the newest version (1:2.2.1-3ubuntu2.1).

GRUB config

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=on iommu=pt kvm_amd.npt=1 kvm_amd.avic=1 video=efifb:off,vesafb:off pcie_acs_override=downstream pcie_aspm=off vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 rd.driver.pre=vfio-pci vfio_pci.disable_vga=1 vfio_pci.disable_idle_d3=1 acpi_enforce_resources=lax amdgpu.ppfeaturemask=0xffffffff"

IOMMU Group

IOMMU group 17
	03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
IOMMU group 35
	10:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
IOMMU group 7
	00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 25
	03:0a.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]
	0d:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU group 15
	01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU group 43
	15:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
IOMMU group 33
	0f:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU group 5
	00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 23
	03:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]
	0b:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
	0b:00.1 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
	0b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c]
IOMMU group 13
	00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
	00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU group 41
	15:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
IOMMU group 31
	09:00.0 PCI bridge [0604]: Tundra Semiconductor Corp. Tsi381 PCIe to PCI Bridge [10e3:8111] (rev 02)
	0a:00.0 Multimedia audio controller [0401]: Creative Labs CA0108/CA10300 [Sound Blaster Audigy Series] [1102:0008]
IOMMU group 3
	00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 21
	03:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
IOMMU group 11
	00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 1
	00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 38
	13:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c4)
IOMMU group 28
	06:00.0 Network controller [0280]: Intel Corporation Wi-Fi 6 AX200 [8086:2723] (rev 1a)
IOMMU group 18
	03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
IOMMU group 36
	11:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c4)
IOMMU group 8
	00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 26
	04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU group 16
	02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream [1022:57ad]
IOMMU group 44
	15:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
IOMMU group 34
	10:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c4)
IOMMU group 6
	00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 24
	03:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4]
	0c:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51)
IOMMU group 14
	00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0 [1022:1440]
	00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1 [1022:1441]
	00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2 [1022:1442]
	00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3 [1022:1443]
	00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4 [1022:1444]
	00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5 [1022:1445]
	00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6 [1022:1446]
	00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7 [1022:1447]
IOMMU group 42
	15:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
IOMMU group 32
	0e:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c4)
IOMMU group 4
	00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 22
	03:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
IOMMU group 12
	00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 40
	14:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
IOMMU group 30
	08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 01)
IOMMU group 2
	00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
IOMMU group 20
	03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
IOMMU group 10
	00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484]
IOMMU group 39
	13:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
IOMMU group 29
	07:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
IOMMU group 0
	00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 19
	03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3]
IOMMU group 37
	12:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU group 9
	00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
IOMMU group 27
	05:00

lspci -nnv

00:00.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex [1022:1480]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex [1022:1480]
	Flags: fast devsel
00:00.2 IOMMU [0806]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU [1022:1481]
	Flags: fast devsel, IRQ 26
	Capabilities: <access denied>
00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
	Flags: fast devsel
00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 27
	Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
	I/O behind bridge: [disabled]
	Memory behind bridge: fcf00000-fcffffff [size=1M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 28
	Bus: primary=00, secondary=02, subordinate=0d, sec-latency=0
	I/O behind bridge: 0000b000-0000dfff [size=12K]
	Memory behind bridge: fbe00000-fc7fffff [size=10M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
	Flags: fast devsel
00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
	Flags: fast devsel
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 29
	Bus: primary=00, secondary=0e, subordinate=10, sec-latency=0
	I/O behind bridge: 0000f000-0000ffff [size=4K]
	Memory behind bridge: fcd00000-fcefffff [size=2M]
	Prefetchable memory behind bridge: 0000001100000000-00000013ffffffff [size=12G]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
00:03.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 30
	Bus: primary=00, secondary=11, subordinate=13, sec-latency=0
	I/O behind bridge: 0000e000-0000efff [size=4K]
	Memory behind bridge: fcb00000-fccfffff [size=2M]
	Prefetchable memory behind bridge: 0000007fc0000000-0000007fd01fffff [size=258M]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
	Flags: fast devsel
00:05.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
	Flags: fast devsel
00:07.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
	Flags: fast devsel
00:07.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 32
	Bus: primary=00, secondary=14, subordinate=14, sec-latency=0
	I/O behind bridge: [disabled]
	Memory behind bridge: [disabled]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
	Flags: fast devsel
00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Internal PCIe GPP Bridge 0 to bus[E:B] [1022:1484] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 33
	Bus: primary=00, secondary=15, subordinate=15, sec-latency=0
	I/O behind bridge: [disabled]
	Memory behind bridge: fc800000-fcafffff [size=3M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 61)
	Subsystem: Gigabyte Technology Co., Ltd FCH SMBus Controller [1458:5001]
	Flags: 66MHz, medium devsel
	Kernel driver in use: piix4_smbus
	Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
	Subsystem: Gigabyte Technology Co., Ltd FCH LPC Bridge [1458:5001]
	Flags: bus master, 66MHz, medium devsel, latency 0
00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0 [1022:1440]
	Flags: fast devsel
00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1 [1022:1441]
	Flags: fast devsel
00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2 [1022:1442]
	Flags: fast devsel
00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3 [1022:1443]
	Flags: fast devsel
	Kernel driver in use: k10temp
	Kernel modules: k10temp
00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4 [1022:1444]
	Flags: fast devsel
00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5 [1022:1445]
	Flags: fast devsel
00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6 [1022:1446]
	Flags: fast devsel
00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7 [1022:1447]
	Flags: fast devsel
01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808] (prog-if 02 [NVM Express])
	Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a801]
	Flags: bus master, fast devsel, latency 0, IRQ 68, NUMA node 0
	Memory at fcf00000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: nvme
	Kernel modules: nvme
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse Switch Upstream [1022:57ad] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 24
	Bus: primary=02, secondary=03, subordinate=0d, sec-latency=0
	I/O behind bridge: 0000b000-0000dfff [size=12K]
	Memory behind bridge: fbe00000-fc7fffff [size=10M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
03:01.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 34
	Bus: primary=03, secondary=04, subordinate=04, sec-latency=0
	I/O behind bridge: [disabled]
	Memory behind bridge: fc700000-fc7fffff [size=1M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
03:02.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 35
	Bus: primary=03, secondary=05, subordinate=05, sec-latency=0
	I/O behind bridge: [disabled]
	Memory behind bridge: fc600000-fc6fffff [size=1M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
03:03.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 36
	Bus: primary=03, secondary=06, subordinate=06, sec-latency=0
	I/O behind bridge: [disabled]
	Memory behind bridge: fc500000-fc5fffff [size=1M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
03:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 37
	Bus: primary=03, secondary=07, subordinate=07, sec-latency=0
	I/O behind bridge: 0000d000-0000dfff [size=4K]
	Memory behind bridge: fc400000-fc4fffff [size=1M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
03:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 38
	Bus: primary=03, secondary=08, subordinate=08, sec-latency=0
	I/O behind bridge: 0000c000-0000cfff [size=4K]
	Memory behind bridge: fc300000-fc3fffff [size=1M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
03:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a3] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 39
	Bus: primary=03, secondary=09, subordinate=0a, sec-latency=0
	I/O behind bridge: 0000b000-0000bfff [size=4K]
	Memory behind bridge: fc200000-fc2fffff [size=1M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
03:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 40
	Bus: primary=03, secondary=0b, subordinate=0b, sec-latency=0
	I/O behind bridge: [disabled]
	Memory behind bridge: fbe00000-fbffffff [size=2M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
03:09.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 42
	Bus: primary=03, secondary=0c, subordinate=0c, sec-latency=0
	I/O behind bridge: [disabled]
	Memory behind bridge: fc100000-fc1fffff [size=1M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
03:0a.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Matisse PCIe GPP Bridge [1022:57a4] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 44
	Bus: primary=03, secondary=0d, subordinate=0d, sec-latency=0
	I/O behind bridge: [disabled]
	Memory behind bridge: fc000000-fc0fffff [size=1M]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808] (prog-if 02 [NVM Express])
	Subsystem: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a801]
	Flags: fast devsel, IRQ 41
	Memory at fc700000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: **vfio-pci**
	Kernel modules: nvme
05:00.0 USB controller [0c03]: VIA Technologies, Inc. VL805 USB 3.0 Host Controller [1106:3483] (rev 01) (prog-if 30 [XHCI])
	Subsystem: VIA Technologies, Inc. VL805 USB 3.0 Host Controller [1106:3483]
	Flags: bus master, fast devsel, latency 0, IRQ 49
	Memory at fc600000 (64-bit, non-prefetchable) [size=4K]
	Capabilities: <access denied>
	Kernel driver in use: xhci_hcd
06:00.0 Network controller [0280]: Intel Corporation Wi-Fi 6 AX200 [8086:2723] (rev 1a)
	Subsystem: Intel Corporation Wi-Fi 6 AX200 [8086:0084]
	Flags: bus master, fast devsel, latency 0, IRQ 143
	Memory at fc500000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi
07:00.0 Ethernet controller [0200]: Intel Corporation I211 Gigabit Network Connection [8086:1539] (rev 03)
	Subsystem: Gigabyte Technology Co., Ltd I211 Gigabit Network Connection [1458:e000]
	Flags: bus master, fast devsel, latency 0, IRQ 24
	Memory at fc400000 (32-bit, non-prefetchable) [size=128K]
	I/O ports at d000 [size=32]
	Memory at fc420000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: igb
	Kernel modules: igb
08:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 01)
	Subsystem: Gigabyte Technology Co., Ltd RTL8125 2.5GbE Controller [1458:e000]
	Flags: bus master, fast devsel, latency 0, IRQ 41
	I/O ports at c000 [size=256]
	Memory at fc370000 (64-bit, non-prefetchable) [size=64K]
	Memory at fc39c000 (64-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: r8169
	Kernel modules: r8169
09:00.0 PCI bridge [0604]: Tundra Semiconductor Corp. Tsi381 PCIe to PCI Bridge [10e3:8111] (rev 02) (prog-if 00 [Normal decode])
	DeviceName: RTL8111EPV
	Flags: bus master, fast devsel, latency 0
	Memory at fc200000 (32-bit, non-prefetchable) [size=4K]
	Bus: primary=09, secondary=0a, subordinate=0a, sec-latency=32
	I/O behind bridge: 0000b000-0000bfff [size=4K]
	Memory behind bridge: [disabled]
	Prefetchable memory behind bridge: [disabled]
	Capabilities: <access denied>
0a:00.0 Multimedia audio controller [0401]: Creative Labs CA0108/CA10300 [Sound Blaster Audigy Series] [1102:0008]
	Subsystem: Creative Labs SB1550 Audigy 5/Rx [1102:1024]
	Flags: medium devsel, IRQ 43
	I/O ports at b000 [size=64]
	Capabilities: <access denied>
	Kernel driver in use: **vfio-pci**
	Kernel modules: snd_emu10k1
0b:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
	Flags: fast devsel
	Capabilities: <access denied>
0b:00.1 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c] (prog-if 30 [XHCI])
	Subsystem: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:1486]
	Flags: bus master, fast devsel, latency 0, IRQ 50
	Memory at fbf00000 (64-bit, non-prefetchable) [size=1M]
	Capabilities: <access denied>
	Kernel driver in use: xhci_hcd
0b:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c] (prog-if 30 [XHCI])
	Subsystem: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:148c]
	Flags: bus master, fast devsel, latency 0, IRQ 43
	Memory at fbe00000 (64-bit, non-prefetchable) [size=1M]
	Capabilities: <access denied>
	Kernel driver in use: xhci_hcd
0c:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51) (prog-if 01 [AHCI 1.0])
	Subsystem: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901]
	Flags: bus master, fast devsel, latency 0, IRQ 70
	Memory at fc100000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: <access denied>
	Kernel driver in use: ahci
	Kernel modules: ahci
0d:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901] (rev 51) (prog-if 01 [AHCI 1.0])
	Subsystem: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] [1022:7901]
	Flags: bus master, fast devsel, latency 0, IRQ 91
	Memory at fc000000 (32-bit, non-prefetchable) [size=2K]
	Capabilities: <access denied>
	Kernel driver in use: ahci
	Kernel modules: ahci
0e:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c4) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 45
	Memory at fce00000 (32-bit, non-prefetchable) [size=16K]
	Bus: primary=0e, secondary=0f, subordinate=10, sec-latency=0
	I/O behind bridge: 0000f000-0000ffff [size=4K]
	Memory behind bridge: fcd00000-fcdfffff [size=1M]
	Prefetchable memory behind bridge: 0000001100000000-00000013ffffffff [size=12G]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
0f:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 46
	Bus: primary=0f, secondary=10, subordinate=10, sec-latency=0
	I/O behind bridge: 0000f000-0000ffff [size=4K]
	Memory behind bridge: fcd00000-fcdfffff [size=1M]
	Prefetchable memory behind bridge: 0000001100000000-00000013ffffffff [size=12G]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
10:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c4) (prog-if 00 [VGA controller])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:0b36]
	Flags: bus master, fast devsel, latency 0, IRQ 164
	Memory at 1200000000 (64-bit, prefetchable) [size=8G]
	Memory at 1100000000 (64-bit, prefetchable) [size=2M]
	I/O ports at f000 [size=256]
	Memory at fcd00000 (32-bit, non-prefetchable) [size=512K]
	Expansion ROM at fcd80000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu
10:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
	Flags: bus master, fast devsel, latency 0, IRQ 161
	Memory at fcda0000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
11:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c4) (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 47
	Memory at fcc00000 (32-bit, non-prefetchable) [size=16K]
	Bus: primary=11, secondary=12, subordinate=13, sec-latency=0
	I/O behind bridge: 0000e000-0000efff [size=4K]
	Memory behind bridge: fcb00000-fcbfffff [size=1M]
	Prefetchable memory behind bridge: 0000007fc0000000-0000007fd01fffff [size=258M]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
12:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479] (prog-if 00 [Normal decode])
	Flags: bus master, fast devsel, latency 0, IRQ 48
	Bus: primary=12, secondary=13, subordinate=13, sec-latency=0
	I/O behind bridge: 0000e000-0000efff [size=4K]
	Memory behind bridge: fcb00000-fcbfffff [size=1M]
	Prefetchable memory behind bridge: 0000007fc0000000-0000007fd01fffff [size=258M]
	Capabilities: <access denied>
	Kernel driver in use: pcieport
13:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:731f] (rev c4) (prog-if 00 [VGA controller])
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] [1002:0b36]
	Flags: fast devsel, IRQ 47
	Memory at 7fc0000000 (64-bit, prefetchable) [size=256M]
	Memory at 7fd0000000 (64-bit, prefetchable) [size=2M]
	I/O ports at e000 [size=256]
	Memory at fcb00000 (32-bit, non-prefetchable) [size=512K]
	Expansion ROM at fcb80000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: **vfio-pci**
	Kernel modules: amdgpu
13:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 HDMI Audio [1002:ab38]
	Flags: fast devsel, IRQ 59
	Memory at fcba0000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: <access denied>
	Kernel driver in use: **vfio-pci**
	Kernel modules: snd_hda_intel
14:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function [1022:148a]
	Flags: fast devsel
	Capabilities: <access denied>
15:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP [1022:1485]
	Flags: fast devsel
	Capabilities: <access denied>
15:00.1 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP [1022:1486]
	Flags: bus master, fast devsel, latency 0, IRQ 140
	Memory at fc900000 (32-bit, non-prefetchable) [size=1M]
	Memory at fca08000 (32-bit, non-prefetchable) [size=8K]
	Capabilities: <access denied>
	Kernel driver in use: ccp
	Kernel modules: ccp
15:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller [1022:149c] (prog-if 30 [XHCI])
	Subsystem: Gigabyte Technology Co., Ltd Matisse USB 3.0 Host Controller [1458:5007]
	Flags: bus master, fast devsel, latency 0, IRQ 59
	Memory at fc800000 (64-bit, non-prefetchable) [size=1M]
	Capabilities: <access denied>
	Kernel driver in use: xhci_hcd
15:00.4 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller [1022:1487]
	Subsystem: Gigabyte Technology Co., Ltd Starship/Matisse HD Audio Controller [1458:a0cd]
	Flags: bus master, fast devsel, latency 0, IRQ 163
	Memory at fca00000 (32-bit, non-prefetchable) [size=32K]
	Capabilities: <access denied>
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

As you can see, i’m dedicated 4 PCIE devices to the VM guest, a NVME drive, a Sound card and my Asus GPU and its HDMI audio driver. They all have the vfio-pci for Kernel driver in use.

VM configuration

<domain type='kvm'>
  <name>Windows10</name>
  <uuid>8f0a6a18-4c68-44f3-a205-474367edbf2a</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <vcpu placement='static' cpuset='0-11'>12</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='4'/>
    <vcpupin vcpu='2' cpuset='1'/>
    <vcpupin vcpu='3' cpuset='5'/>
    <vcpupin vcpu='4' cpuset='2'/>
    <vcpupin vcpu='5' cpuset='6'/>
    <vcpupin vcpu='6' cpuset='3'/>
    <vcpupin vcpu='7' cpuset='7'/>
    <vcpupin vcpu='8' cpuset='8'/>
    <vcpupin vcpu='9' cpuset='12'/>
    <vcpupin vcpu='10' cpuset='9'/>
    <vcpupin vcpu='11' cpuset='13'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-i440fx-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/Windows10_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vpindex state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <frequencies state='on'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
    <ioapic driver='kvm'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='6' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/common_data/Downloads/virtio-win-0.1.185.iso'/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:38:c5:42'/>
      <source network='default'/>
      <model type='e1000e'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes'>
      <listen type='address'/>
    </graphics>
    <sound model='ich9'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='vga' vram='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
      </source>
      <boot order='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x13' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x13' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0b' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0c' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x15' slot='0x00' function='0x3'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0d' function='0x0'/>
    </hostdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='3'/>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </memballoon>
  </devices>
</domain>

Can anyone tell me what i am missing here? Kinda sad to successfully reached all these steps and being held by something i have probably missed.
Thanks a ton in advance and enjoy the rest of your week.

I formatted your post.

For large code block surround them by triple backticks ``` and they become much for readable.

As to your question,

Is probably your problem. It has the FLR bug, ie the reset bug. Work-around-able with a kernel patch.

Reading:
https://www.reddit.com/r/Amd/comments/jehkey/will_big_navi_support_function_level_reset_flr/


Thank you for formatting properly my post, truly appreciated. I’m currently trying to patch my kernel and i will let you know if i am successful :slight_smile:

1 Like

I’m getting an error when i am trying to apply the patch to the kernel.

patch -p1 < ~/linux-fix_navi_reset.patch
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
|index 2a589b6d6ed8..1b84090db691 100644
|--- a/drivers/pci/quirks.c
|+++ b/drivers/pci/quirks.c

What are you trying to apply the patch to? apt-get source? Ubuntu kernel git? Mainline kernel git?

apt-get source currently

I’m not %100 sure of the file structure of that. You may need to cd to a sub directory, or change the path to the file in the patch.

You may want to apt install linux-source<version> instead. That will drop a tar file in /usr/src, which you can then unpack and that should have the standard file structure.

After a couple of reading, i figured out the procedure for patching my GNU/Linux Ubuntu Focal kernel.

I followed the instructions provided here: https://askubuntu.com/questions/724900/how-to-apply-kernel-patches and https://www.maketecheasier.com/build-custom-kernel-ubuntu/

I installed all these packages:

sudo apt-get build-dep linux linux-image-$(uname -r)
sudo apt-get install libncurses-dev flex bison openssl libssl-dev dkms libelf-dev libudev-dev libpci-dev libiberty-dev autoconf
sudo apt-get install git build-essential kernel-package fakeroot libncurses5-dev
sudo apt-get install pkg-config-dbgsym

I selected the branch of the current kernel installed on my host machine:

git clone https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/focal

I used the suggested patch for fixing the FLR bug, i called it linux-fix_navi_reset.patch on my computer:

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 2a589b6d6ed8..1b84090db691 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3969,17 +3969,517 @@ static int delay_250ms_after_flr(struct pci_dev *dev, int probe)
 	return 0;
 }
 
+/*
+ * AMD Navi 10 series GPUs require a vendor specific reset procedure.
+ * According to AMD a PSP mode 2 reset should be enough however at this
+ * time the details of how to perform this are not available to us.
+ * Instead we can signal the PSP to perform a mode 1 reset, which _is_
+ * available to us. Unfortunately, it probably takes more time.
+ */
+static int reset_amd_navi10(struct pci_dev *dev, int probe)
+{
+	static const u16 vega10_device_ids[] = { 0x6860, 0x6861, 0x6862, 0x6863,
+						 0x6864, 0x6867, 0x6868, 0x6869,
+						 0x686a, 0x686b, 0x686c, 0x686d,
+						 0x686e, 0x686f, 0x687f };
+
+	static const u16 vega12_device_ids[] = {
+		0x69A0, 0x69A1, 0x69A2, 0x69A3, 0x69AF,
+	};
+
+	static const u16 vega20_device_ids[] = {
+		0x66A0, 0x66A1, 0x66A2, 0x66A3, 0x66A4, 0x66A7, 0x66AF,
+	};
+
+	const int PPSMC_MSG_DisableSmuFeatures = 0x5; /* vega10 */
+	const int PPSMC_MSG_DisableAllSmuFeatures = 0x7; /* vega12/vega20 */
+	const int SMC_DPM_FEATURES = 0x30F;
+
+	const int mmMM_INDEX = 0x0000;
+	const int mmMM_DATA = 0x0001;
+	const int mmPCIE_INDEX2 = 0x000e;
+	const int mmPCIE_DATA2 = 0x000f;
+
+	const int MP0_BASE = 0x00016000L;
+	const int mmMP0_SMN_C2PMSG_33 = 0x0061 + MP0_BASE;
+	const int mmMP0_SMN_C2PMSG_35 = 0x0063 + MP0_BASE;
+	const int mmMP0_SMN_C2PMSG_64 = 0x0080 + MP0_BASE;
+	const int mmMP0_SMN_C2PMSG_81 = 0x0091 + MP0_BASE;
+	const int mmMP1_SMN_C2PMSG_66 = 0x0282 + MP0_BASE;
+	const int mmMP1_SMN_C2PMSG_82 = 0x0292 + MP0_BASE;
+	const int mmMP1_SMN_C2PMSG_90 = 0x029a + MP0_BASE;
+
+	const int GFX_CTRL_CMD_ID_MODE1_RST = 0x00070000L;
+	const int MP1_Public = 0x03b00000L;
+	const int smnMP1_PUB_CTRL = 0x3010b14;
+	const int MP1_SMN_PUB_CTRL__RESET_MASK = 0x00000001L;
+	int smnMP1_FIRMWARE_FLAGS = 0x3010024;
+	const int MP1_FIRMWARE_FLAGS__INTERRUPTS_ENABLED_MASK = 0x00000001L;
+	const int MP1_FIRMWARE_FLAGS__INTERRUPTS_ENABLED__SHIFT = 0x0;
+
+	const int mmRCC_DEV0_EPF0_RCC_CONFIG_MEMSIZE = 0x00c3 + 0x00000D20L;
+
+	// this is the case for navi but I'm not sure about vega yet
+	const int scratch_reg_offset = 0x4c;
+	const int ATOM_S3_ASIC_GUI_ENGINE_HUNG = 0x20000000L;
+
+	enum { VEGA10, VEGA12, VEGA20, NAVI } device_type = NAVI;
+
+	u16 cfg;
+	u32 tmp, smu_resp, sol, mp1_intr, psp_bl_ready;
+	resource_size_t mmio_base, mmio_size;
+	uint32_t __iomem *mmio;
+	unsigned int timeout;
+	spinlock_t pcie_lock;
+	int i;
+
+	/*
+	 * if the device has FLR return -ENOTTY indicating that we have no
+	 * device-specific reset method.
+	 */
+	if (pcie_has_flr(dev))
+		return -ENOTTY;
+
+	/* bus resets still cause navi to flake out */
+	dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
+
+	/* if we're managed by amdgpu, do not reset */
+	if (dev->driver && !strcmp(dev->driver->name, "amdgpu"))
+		return -ENOTTY;
+
+	if (probe)
+		return 0;
+
+	spin_lock_init(&pcie_lock);
+
+	/* detect device */
+	for (i = 0; i < sizeof vega10_device_ids / sizeof(u16); ++i) {
+		if (dev->device == vega10_device_ids[i]) {
+			device_type = VEGA10;
+			goto done_detect;
+		}
+	}
+	for (i = 0; i < sizeof vega12_device_ids / sizeof(u16); ++i) {
+		if (dev->device == vega12_device_ids[i]) {
+			device_type = VEGA12;
+			goto done_detect;
+		}
+	}
+	for (i = 0; i < sizeof vega20_device_ids / sizeof(u16); ++i) {
+		if (dev->device == vega20_device_ids[i]) {
+			device_type = VEGA20;
+			goto done_detect;
+		}
+	}
+
+done_detect:
+
+	if (device_type != NAVI && device_type != VEGA20) {
+		smnMP1_FIRMWARE_FLAGS = 0x3010028;
+	}
+
+	/* map BAR5 */
+	mmio_base = pci_resource_start(dev, 5);
+	mmio_size = pci_resource_len(dev, 5);
+	mmio = ioremap(mmio_base, mmio_size);
+	if (mmio == NULL) {
+		pci_disable_device(dev);
+		pci_err(dev, "Navi10: cannot iomap device\n");
+		return 0;
+	}
+
+	/* save the PCI state and enable memory access */
+	pci_read_config_word(dev, PCI_COMMAND, &cfg);
+	pci_write_config_word(dev, PCI_COMMAND, cfg | PCI_COMMAND_MEMORY);
+
+	pci_set_power_state(dev, PCI_D0);
+
+#define RREG32(reg)                                                            \
+	({                                                                     \
+		u32 out;                                                       \
+		if ((reg) < mmio_size)                                         \
+			out = readl(mmio + (reg));                             \
+		else {                                                         \
+			writel((reg), mmio + mmMM_INDEX);                      \
+			out = readl(mmio + mmMM_DATA);                         \
+		}                                                              \
+		out;                                                           \
+	})
+
+#define WREG32(reg, v)                                                         \
+	do {                                                                   \
+		if ((reg) < mmio_size)                                         \
+			writel(v, mmio + (reg));                               \
+		else {                                                         \
+			writel((reg), mmio + mmMM_INDEX);                      \
+			writel(v, mmio + mmMM_DATA);                           \
+		}                                                              \
+	} while (0)
+
+#define WREG32_PCIE(reg, v)                                                    \
+	do {                                                                   \
+		unsigned long __flags;                                         \
+		spin_lock_irqsave(&pcie_lock, __flags);                        \
+		WREG32(mmPCIE_INDEX2, reg);                                    \
+		(void)RREG32(mmPCIE_INDEX2);                                   \
+		WREG32(mmPCIE_DATA2, v);                                       \
+		(void)RREG32(mmPCIE_DATA2);                                    \
+		spin_unlock_irqrestore(&pcie_lock, __flags);                   \
+	} while (0)
+
+#define RREG32_PCIE(reg)                                                       \
+	({                                                                     \
+		unsigned long __flags;                                         \
+		u32 __tmp_read;                                                \
+		spin_lock_irqsave(&pcie_lock, __flags);                        \
+		WREG32(mmPCIE_INDEX2, reg);                                    \
+		(void)RREG32(mmPCIE_INDEX2);                                   \
+		__tmp_read = RREG32(mmPCIE_DATA2);                             \
+		spin_unlock_irqrestore(&pcie_lock, __flags);                   \
+		__tmp_read;                                                    \
+	})
+
+#define SMU_WAIT()                                                             \
+	({                                                                     \
+		u32 __tmp;                                                     \
+		for (timeout = 100000;                                         \
+		     timeout &&                                                \
+		     (RREG32(mmMP1_SMN_C2PMSG_90) & 0xFFFFFFFFL) == 0;         \
+		     --timeout)                                                \
+			udelay(1);                                             \
+		if ((__tmp = RREG32(mmMP1_SMN_C2PMSG_90)) != 0x1)              \
+			pci_info(dev, "Navi10: SMU error 0x%x (line %d)\n",    \
+				 __tmp, __LINE__);                             \
+	})
+
+	/* it's important we wait for the SOC to be ready */
+	for (timeout = 100000; timeout; --timeout) {
+		sol = RREG32(mmMP0_SMN_C2PMSG_81);
+		if (sol != 0xFFFFFFFF)
+			break;
+		udelay(1);
+	}
+
+	if (sol == 0xFFFFFFFF) {
+		pci_warn(dev,
+			 "Navi10: Timed out waiting for SOL to be valid\n");
+		/* we will continue anyway because it's possible to do
+		 * a mode 1 reset on navi at least */
+	}
+
+	/*
+	 * okay there's three things we need to check:
+	 * sign-of-life, MP1 intr state (enabled means MP1 is probably live),
+	 * and finally PSP bootloader state (needs to be ready)
+	 */
+	smu_resp = RREG32(mmMP1_SMN_C2PMSG_90);
+	mp1_intr = (RREG32_PCIE(MP1_Public |
+				(smnMP1_FIRMWARE_FLAGS & 0xffffffff)) &
+		    MP1_FIRMWARE_FLAGS__INTERRUPTS_ENABLED_MASK) >>
+		   MP1_FIRMWARE_FLAGS__INTERRUPTS_ENABLED__SHIFT;
+	psp_bl_ready = !!(RREG32(mmMP0_SMN_C2PMSG_35) & 0x80000000L);
+	pci_info(
+		dev,
+		"Navi10: Device type %d, SMU response reg: %x, sol reg: %x, mp1 intr enabled? %s, bl ready? %s\n",
+		device_type, smu_resp, sol, mp1_intr ? "yes" : "no",
+		psp_bl_ready ? "yes" : "no");
+
+	/* check the sign of life indicator */
+	if (sol == 0x0 && !mp1_intr && psp_bl_ready) {
+		/* either already clean or in a state we can't fix */
+		goto out;
+	}
+
+	/* save the state around the reset */
+	pci_info(dev, "Navi10: Clear master\n");
+	pci_clear_master(dev);
+
+	pci_save_state(dev);
+
+	/* this tells the drivers nvram is lost and everything needs to be reset */
+	pci_info(dev, "Navi10: Clearing scratch regs 6 and 7\n");
+	WREG32(scratch_reg_offset + 6, 0);
+	WREG32(scratch_reg_offset + 7, 0);
+
+	/* it only makes sense to reset mp1 if it's running
+	 * XXX: is this even necessary? in early testing, I ran into
+	 * situations where MP1 was alive but not responsive, but in
+	 * later testing I have not been able to replicate this scenario.
+	 */
+	if (smu_resp != 0x01 && mp1_intr && device_type == NAVI) {
+		pci_info(dev, "Navi10: MP1 reset\n");
+		WREG32_PCIE(MP1_Public | (smnMP1_PUB_CTRL & 0xffffffff),
+			    1 & MP1_SMN_PUB_CTRL__RESET_MASK);
+		WREG32_PCIE(MP1_Public | (smnMP1_PUB_CTRL & 0xffffffff),
+			    1 & ~MP1_SMN_PUB_CTRL__RESET_MASK);
+
+		pci_info(dev, "Navi10: wait for MP1\n");
+		for (timeout = 100000; timeout; --timeout) {
+			tmp = RREG32_PCIE(MP1_Public |
+					  (smnMP1_FIRMWARE_FLAGS & 0xffffffff));
+			if ((tmp &
+			     MP1_FIRMWARE_FLAGS__INTERRUPTS_ENABLED_MASK) >>
+			    MP1_FIRMWARE_FLAGS__INTERRUPTS_ENABLED__SHIFT)
+				break;
+			udelay(1);
+		}
+
+		if (!timeout &&
+		    !((tmp & MP1_FIRMWARE_FLAGS__INTERRUPTS_ENABLED_MASK) >>
+		      MP1_FIRMWARE_FLAGS__INTERRUPTS_ENABLED__SHIFT)) {
+			pci_warn(dev,
+				 "Navi10: timed out waiting for MP1 reset\n");
+		}
+
+		SMU_WAIT();
+		smu_resp = RREG32(mmMP1_SMN_C2PMSG_90);
+		pci_info(dev, "Navi10: SMU resp reg: %x\n", tmp);
+	}
+
+	pci_info(dev, "Navi10: begin reset\n");
+
+	/*
+	 * again, this only makes sense if we have an SMU to talk to
+	 * some of these may fail, that's okay. we're just turning off as many
+	 * things as possible
+	 */
+	if (mp1_intr) {
+		SMU_WAIT();
+
+		switch (device_type) {
+		case NAVI:
+			/* stop SMC */
+			pci_info(dev, "Navi10: gfx off\n");
+			WREG32(mmMP1_SMN_C2PMSG_90, 0x00);
+			WREG32(mmMP1_SMN_C2PMSG_82, 0x00);
+			WREG32(mmMP1_SMN_C2PMSG_66, 0x2A);
+			SMU_WAIT();
+
+			/* stop SMC */
+			pci_info(dev, "Navi10: Prep Reset\n");
+			WREG32(mmMP1_SMN_C2PMSG_90, 0x00);
+			WREG32(mmMP1_SMN_C2PMSG_82, 0x00);
+			/* PPSMC_MSG_PrepareMp1ForReset */
+			WREG32(mmMP1_SMN_C2PMSG_66, 0x33);
+			SMU_WAIT();
+			break;
+
+		case VEGA10:
+
+			/*
+		 * frankly, I have no idea if this has to be done in a particular order
+		 * hence this long sequence of bits that comes from the hwmgr destroy
+		 */
+
+#define DISABLE_FEAT(bits)                                                     \
+	do {                                                                   \
+		WREG32(mmMP1_SMN_C2PMSG_90, 0x00);                             \
+		WREG32(mmMP1_SMN_C2PMSG_82, (bits));                           \
+		WREG32(mmMP1_SMN_C2PMSG_66, PPSMC_MSG_DisableSmuFeatures);     \
+		SMU_WAIT();                                                    \
+	} while (0)
+
+			/* disable power containment */
+			pci_info(dev, "Vega10: disable power containment\n");
+			DISABLE_FEAT(1L << 14);
+			DISABLE_FEAT(1L << 15);
+
+			/* disable avfs */
+			pci_info(dev, "Vega10: disable avfs\n");
+			DISABLE_FEAT(1L << 10);
+
+			/* stop dpm */
+			pci_info(dev, "Vega10: disable dpm\n");
+			DISABLE_FEAT(1L << 24);
+			DISABLE_FEAT(SMC_DPM_FEATURES);
+
+			/* disable deep sleep */
+			pci_info(dev, "Vega10: disable deep sleep\n");
+			DISABLE_FEAT(1L << 11);
+			DISABLE_FEAT(1L << 12);
+			DISABLE_FEAT(1L << 13);
+			DISABLE_FEAT(1L << 9);
+
+			/* disable ulv */
+			pci_info(dev, "Vega10: disable ulv\n");
+			DISABLE_FEAT(1L << 6);
+
+			/* disable acg */
+			pci_info(dev, "Vega10: disable acg\n");
+			DISABLE_FEAT(1L << 28);
+
+			/* disable PCC limit */
+			pci_info(dev, "Vega10: disable PCC\n");
+			DISABLE_FEAT(1L << 29);
+
+#undef DISABLE_FEAT
+			break;
+
+		case VEGA12:
+		case VEGA20:
+			pci_info(dev, "Vega12/20: disable GFX\n");
+			WREG32(mmMP1_SMN_C2PMSG_90, 0x00);
+			WREG32(mmMP1_SMN_C2PMSG_82, 0x00);
+			WREG32(mmMP1_SMN_C2PMSG_66,
+			       PPSMC_MSG_DisableAllSmuFeatures);
+			SMU_WAIT();
+			break;
+
+		default:
+			break;
+		}
+	}
+
+#define PSP_WAIT(reg_index, reg_val, mask, err_exit)                                                                         \
+	do {                                                                                                                 \
+		for (timeout = 1000000; timeout; --timeout) {                                                                \
+			tmp = RREG32(reg_index);                                                                             \
+			if ((tmp & mask) == reg_val)                                                                         \
+				break;                                                                                       \
+			udelay(1);                                                                                           \
+		}                                                                                                            \
+		if (((tmp & mask) != reg_val)) {                                                                             \
+			pci_err(dev,                                                                                         \
+				"Navi10: reg %08x (0x%08x, masked 0x%08x) did not reach needed value (0x%08x), (line %d)\n", \
+				reg_index, tmp, (tmp & mask), reg_val,                                                       \
+				__LINE__);                                                                                   \
+			if (err_exit)                                                                                        \
+				goto mode1_out;                                                                              \
+		}                                                                                                            \
+	} while (0)
+
+	pci_info(dev, "Navi10: begin psp mode 1 reset\n");
+
+	/* I don't know if I have the base register right for non-navi cards */
+	if (device_type == NAVI) {
+		/* mark amdgpu_atombios_scratch_regs_engine_hung */
+		tmp = RREG32(scratch_reg_offset + 3);
+		tmp |= ATOM_S3_ASIC_GUI_ENGINE_HUNG;
+		WREG32(scratch_reg_offset + 3, tmp);
+	}
+
+	/* check validity of PSP before reset */
+	pci_info(dev, "Navi10: PSP wait\n");
+	PSP_WAIT(mmMP0_SMN_C2PMSG_64, 0x80000000, 0x8000FFFF, false);
+
+	/* reset command */
+	pci_info(dev, "Navi10: do mode1 reset\n");
+	WREG32(mmMP0_SMN_C2PMSG_64, GFX_CTRL_CMD_ID_MODE1_RST);
+	msleep(500);
+
+	/* wait for ACK */
+	pci_info(dev, "Navi10: PSP wait\n");
+	PSP_WAIT(mmMP0_SMN_C2PMSG_33, 0x80000000, 0x80000000, true);
+
+	pci_info(dev, "Navi10: psp mode1 succeeded\n");
+
+	/* restore state here and wait */
+	pci_restore_state(dev);
+
+	for (timeout = 100000; timeout; --timeout) {
+		tmp = RREG32(mmRCC_DEV0_EPF0_RCC_CONFIG_MEMSIZE);
+
+		if (tmp != 0xffffffff)
+			break;
+		udelay(1);
+	}
+	pci_info(dev, "Navi10: memsize: %x\n", tmp);
+
+	if (device_type == NAVI) {
+		/* unmark amdgpu_atombios_scratch_regs_engine_hung */
+		tmp = RREG32(scratch_reg_offset + 3);
+		tmp &= ~ATOM_S3_ASIC_GUI_ENGINE_HUNG;
+		WREG32(scratch_reg_offset + 3, tmp);
+	}
+
+	/* this takes a long time :( */
+	for (timeout = 100; timeout; --timeout) {
+		/* see if PSP bootloader comes back */
+		if (RREG32(mmMP0_SMN_C2PMSG_35) & 0x80000000L)
+			break;
+
+		pci_info(dev, "Navi10: PSP bootloader flags? %x, timeout: %s\n",
+			 RREG32(mmMP0_SMN_C2PMSG_35), !timeout ? "yes" : "no");
+
+		msleep(100);
+	}
+
+	if (!timeout && !(RREG32(mmMP0_SMN_C2PMSG_35) & 0x80000000L)) {
+		pci_info(
+			dev,
+			"Navi10: timed out waiting for PSP bootloader to respond after reset\n");
+	} else {
+		pci_set_power_state(dev, PCI_D3hot);
+		pci_info(dev, "Navi10: PSP mode1 reset successful\n");
+	}
+
+mode1_out:
+	pci_restore_state(dev);
+
+#undef RREG32
+#undef WREG32
+#undef RREG32_PCIE
+#undef WREG32_PCIE
+#undef PSP_WAIT
+#undef SMU_WAIT
+
+out:
+	/* unmap BAR5 */
+	iounmap(mmio);
+
+	/* restore the state and command register */
+	pci_write_config_word(dev, PCI_COMMAND, cfg);
+	return 0;
+}
+
 static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
 	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82599_SFP_VF,
-		 reset_intel_82599_sfp_virtfn },
-	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IVB_M_VGA,
-		reset_ivb_igd },
-	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IVB_M2_VGA,
-		reset_ivb_igd },
+	  reset_intel_82599_sfp_virtfn },
+	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IVB_M_VGA, reset_ivb_igd },
+	{ PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_IVB_M2_VGA, reset_ivb_igd },
 	{ PCI_VENDOR_ID_SAMSUNG, 0xa804, nvme_disable_and_flr },
 	{ PCI_VENDOR_ID_INTEL, 0x0953, delay_250ms_after_flr },
-	{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
-		reset_chelsio_generic_dev },
+	{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID, reset_chelsio_generic_dev },
+	{ PCI_VENDOR_ID_ATI, 0x6860, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x6861, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x6862, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x6863, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x6864, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x6867, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x6868, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x6869, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x686a, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x686b, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x686c, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x686d, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x686e, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x686f, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x687f, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x69A0, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x69A1, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x69A2, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x69A3, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x69AF, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x66A0, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x66A1, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x66A2, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x66A3, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x66A4, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x66A7, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x66AF, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x7310, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x7312, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x7318, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x7319, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x731a, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x731b, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x731f, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x7340, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x7341, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x7347, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x734F, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x7360, reset_amd_navi10 },
+	{ PCI_VENDOR_ID_ATI, 0x7362, reset_amd_navi10 },
 	{ 0 }
 };

I executed the following command-lines:

cd focal
patch -p1 < ~/linux-fix_navi_reset.patch

The previous command displayed that message:

patching file drivers/pci/quirks.c
Hunk #1 succeeded at 3986 (offset 17 lines).

Then i proceeded with the kernel compilation:

cp /boot/config-`uname -r` .config
yes '' | make oldconfig

If you need to perform some modifications in the kernel, execute the optional following command-line. Otherwise, skip it.

make menuconfig

We are now in the endgame:

make clean
make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=-custom
sudo dpkg -i ../*.deb
sudo update-grub

I rebooted my computer but unfortunately same behaviour with the graphic card :’(

What is even more weird is if i use the virt-manager viewer and i launch GPU-Z, my GPU is clearly detected :confused:

1 Like

That is a very nice write up.

Can you check that you booted into the patched kernel?

The reset bug usually manifests as no graphics instead of corrupted graphics from my experience. I have a relatively similar setup to yours (x570 but a ryzen 3700x instead), and I’m wondering if one of your kernel cmdline flags is causing the issue. The only thing similar I’m doing is comparable to rd.driver.pre=vfio-pci. Unless you specifically need all of those driver options, I’d try removing some of them and seeing if the problem disappears or changes.

The patches you applied will help avoid needing to reboot your host if you want to shutdown or restart a VM, or if your VM crashes.

@belfrypossum I tried to remove some options as suggested, performed update-grub each time, rebooted and now every-time, both, my screen and virt-manager viewer are frozen on boot screen sequence.

I put all options back, update-grub, rebooted, same behaviour, both screen and virt-manager viewer are frozen at boot sequence:

Are the following information helpful?

[  657.436050] audit: type=1400 audit(1604714421.218:41): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libvirt-436f8a1e-0693-41c4-b70c-62e94182ec1b" pid=3218 comm="apparmor_parser"
[  657.446186] xhci_hcd 0000:15:00.3: remove, state 4
[  657.446190] usb usb8: USB disconnect, device number 1
[  657.446338] xhci_hcd 0000:15:00.3: USB bus 8 deregistered
[  657.446344] xhci_hcd 0000:15:00.3: remove, state 4
[  657.446345] usb usb7: USB disconnect, device number 1
[  657.446626] xhci_hcd 0000:15:00.3: USB bus 7 deregistered
[  657.569888] audit: type=1400 audit(1604714421.350:42): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-436f8a1e-0693-41c4-b70c-62e94182ec1b" pid=3229 comm="apparmor_parser"
[  657.696462] audit: type=1400 audit(1604714421.478:43): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-436f8a1e-0693-41c4-b70c-62e94182ec1b" pid=3233 comm="apparmor_parser"
[  657.821873] audit: type=1400 audit(1604714421.602:44): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-436f8a1e-0693-41c4-b70c-62e94182ec1b" pid=3238 comm="apparmor_parser"
[  657.947547] audit: type=1400 audit(1604714421.726:45): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-436f8a1e-0693-41c4-b70c-62e94182ec1b" pid=3242 comm="apparmor_parser"
[  658.074831] audit: type=1400 audit(1604714421.854:46): apparmor="STATUS" operation="profile_replace" info="same as current profile, skipping" profile="unconfined" name="libvirt-436f8a1e-0693-41c4-b70c-62e94182ec1b" pid=3246 comm="apparmor_parser"
[  658.078081] virbr0: port 2(vnet0) entered blocking state
[  658.078082] virbr0: port 2(vnet0) entered disabled state
[  658.078118] device vnet0 entered promiscuous mode
[  658.078194] virbr0: port 2(vnet0) entered blocking state
[  658.078195] virbr0: port 2(vnet0) entered listening state
[  658.204883] audit: type=1400 audit(1604714421.986:47): apparmor="STATUS" operation="profile_replace" profile="unconfined" name="libvirt-436f8a1e-0693-41c4-b70c-62e94182ec1b" pid=3270 comm="apparmor_parser"
[  658.239883] audit: type=1400 audit(1604714422.022:48): apparmor="DENIED" operation="open" profile="libvirt-436f8a1e-0693-41c4-b70c-62e94182ec1b" name="/opt/amdgpu/lib/x86_64-linux-gnu/libgbm.so.1.0.0" pid=3272 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
[  658.240205] audit: type=1400 audit(1604714422.022:49): apparmor="DENIED" operation="open" profile="libvirt-436f8a1e-0693-41c4-b70c-62e94182ec1b" name="/opt/amdgpu/lib/x86_64-linux-gnu/libdrm.so.2.4.0" pid=3272 comm="qemu-system-x86" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
[  659.539898] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x19@0x168
[  659.539906] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x1e@0x190
[  659.542391] vfio-pci 0000:13:00.0: Navi10: Device type 3, SMU response reg: 0, sol reg: 0, mp1 intr enabled? no, bl ready? yes
[  659.542617] vfio-pci 0000:13:00.0: vfio_ecap_init: hiding ecap 0x19@0x270
[  659.542628] vfio-pci 0000:13:00.0: vfio_ecap_init: hiding ecap 0x1b@0x2d0
[  659.542632] vfio-pci 0000:13:00.0: vfio_ecap_init: hiding ecap 0x25@0x400
[  659.542633] vfio-pci 0000:13:00.0: vfio_ecap_init: hiding ecap 0x26@0x410
[  659.542635] vfio-pci 0000:13:00.0: vfio_ecap_init: hiding ecap 0x27@0x440
[  660.103864] virbr0: port 2(vnet0) entered learning state
[  660.792101] vfio-pci 0000:13:00.0: Navi10: Device type 3, SMU response reg: 0, sol reg: 0, mp1 intr enabled? no, bl ready? yes
[  662.119551] virbr0: port 2(vnet0) entered forwarding state
[  662.119556] virbr0: topology change detected, propagating
[ 1144.500198] traps: Web Content[3683] general protection fault ip:7f7976c2c202 sp:7ffe5d0b1640 error:0 in libxul.so[7f7973373000+4e7c000]
[ 1370.982622] virbr0: port 2(vnet0) entered disabled state
[ 1370.983887] device vnet0 left promiscuous mode
[ 1370.983890] virbr0: port 2(vnet0) entered disabled state
[ 1371.088831] vfio-pci 0000:13:00.0: Navi10: Device type 3, SMU response reg: 0, sol reg: 0, mp1 intr enabled? no, bl ready? yes
[ 1371.555563] audit: type=1400 audit(1604715135.337:50): apparmor="STATUS" operation="profile_remove" profile="unconfined" name="libvirt-436f8a1e-0693-41c4-b70c-62e94182ec1b" pid=4241 comm="apparmor_parser"
[ 1371.576523] xhci_hcd 0000:15:00.3: enabling device (0000 -> 0002)
[ 1371.576613] xhci_hcd 0000:15:00.3: xHCI Host Controller
[ 1371.576616] xhci_hcd 0000:15:00.3: new USB bus registered, assigned bus number 7
[ 1371.576720] xhci_hcd 0000:15:00.3: hcc params 0x0278ffe5 hci version 0x110 quirks 0x0000000000000410
[ 1371.577032] usb usb7: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.04
[ 1371.577033] usb usb7: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 1371.577034] usb usb7: Product: xHCI Host Controller
[ 1371.577035] usb usb7: Manufacturer: Linux 5.4.65-custom xhci-hcd
[ 1371.577035] usb usb7: SerialNumber: 0000:15:00.3
[ 1371.577133] hub 7-0:1.0: USB hub found
[ 1371.577141] hub 7-0:1.0: 4 ports detected
[ 1371.577334] xhci_hcd 0000:15:00.3: xHCI Host Controller
[ 1371.577336] xhci_hcd 0000:15:00.3: new USB bus registered, assigned bus number 8
[ 1371.577338] xhci_hcd 0000:15:00.3: Host supports USB 3.1 Enhanced SuperSpeed
[ 1371.577346] usb usb8: We don't know the algorithms for LPM for this host, disabling LPM.
[ 1371.577357] usb usb8: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.04
[ 1371.577358] usb usb8: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[ 1371.577358] usb usb8: Product: xHCI Host Controller
[ 1371.577359] usb usb8: Manufacturer: Linux 5.4.65-custom xhci-hcd
[ 1371.577359] usb usb8: SerialNumber: 0000:15:00.3
[ 1371.577431] hub 8-0:1.0: USB hub found
[ 1371.577436] hub 8-0:1.0: 4 ports detected
ceedii@prometheus:~$ 

Based on the logs, it should be functioning fine. However, I would try setting the feature_id flag in the hyperv features, like so

  <features>
    ...
    <hyperv>
      ...
      <vendor_id state="on" value="put anything here"/>
      ..
    </hyperv>
    ..
  </features>

As even though you may not get the code 43 error like with Nvidia, apparently AMD also checks that field and can do weird things.

1 Like

@belfrypossum And that did the trick.

I had to add actually all the lines as suggested on this page https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Video_card_driver_virtualisation_detection

...
<features>
  ...
  <hyperv>
    ...
    <vendor_id state='on' value='whatever'/>
    ...
  </hyperv>
  ...
  <kvm>
    <hidden state='on'/>
  </kvm>**
  ...
</features>
...

Thanks again for your great assistance and also the job you did on the kernel bug patch :slight_smile:

Have a great weekend <3

P.S.: In case of some people are interested, here all my final configuration.

/etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash acpi_enforce_resources=lax amd_iommu=on iommu=pt pcie_acs_override=downstream vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 rd.driver.pre=vfio-pci amdgpu.ppfeaturemask=0xffffffff"

/etc/initramfs-tools/scripts/init-top/bind_vfio.sh

#!/bin/sh

PREREQ=""

prereqs()
{
        echo "$PREREQ"
}

case $1 in
prereqs)
        prereqs
        exit 0
        ;;
esac

DEVS="0000:0a:00.0 0000:04:00.0 0000:13:00.0 0000:13:00.1 0000:15:00.3"
for DEV in $DEVS;
do
        echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
        echo "$DEV" > /sys/bus/pci/drivers/vfio-pci/bind
done

exit 0

Windows 10 Guest configuration

<domain type='kvm'>
  <name>windows10</name>
  <uuid>0571c6e9-e02a-4c92-b830-d5ab33cfadec</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <vcpu placement='static' cpuset='0-11'>12</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='4'/>
    <vcpupin vcpu='2' cpuset='1'/>
    <vcpupin vcpu='3' cpuset='5'/>
    <vcpupin vcpu='4' cpuset='2'/>
    <vcpupin vcpu='5' cpuset='6'/>
    <vcpupin vcpu='6' cpuset='3'/>
    <vcpupin vcpu='7' cpuset='7'/>
    <vcpupin vcpu='8' cpuset='8'/>
    <vcpupin vcpu='9' cpuset='12'/>
    <vcpupin vcpu='10' cpuset='9'/>
    <vcpupin vcpu='11' cpuset='13'/>
    <emulatorpin cpuset='0-11'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <os>
    <type arch='x86_64' machine='pc-q35-4.2'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/windows10_VARS.fd</nvram>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='whatever'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='6' threads='2'/>
    <cache mode='passthrough'/>
    <feature policy='require' name='topoext'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/common_data/Downloads/Win10_20H2_EnglishInternational_x64.iso'/>
      <target dev='sda' bus='sata'/>
      <readonly/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/mnt/common_data/Downloads/virtio-win-0.1.185.iso'/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='7' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x16'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x17'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x7'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0x18'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:cd:11:ef'/>
      <source network='default'/>
      <model type='e1000e'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='-1' autoport='yes'>
      <listen type='address'/>
    </graphics>
    <video>
      <model type='virtio' heads='1' primary='yes'>
        <acceleration accel3d='yes'/>
      </model>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x13' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x13' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
      </source>
      <boot order='2'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x01' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x15' slot='0x00' function='0x3'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </hostdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
</domain>
3 Likes