Attempting single GPU passthrough but only getting a black screen

Okay, so I’m able to stop GDM/GNOME by manually running sudo systemctl stop display-manager.service and then sudo pkill gdm-wayland-session. This doesn’t work in the hook script though for some reason as the session doesn’t get killed ¯\_(ツ)_/¯. I can’t explain why it works when it is manually run but not as a hook script.

Anyway, if I run those it drops me to a black screen with a flashing cursor in the top-left (guessing this is vconsole)? If I try run modprobe -r amdgpu from SSH it still does modprobe: FATAL: Module amdgpu is in use.

Output of lsmod | grep amdgpu (I can’t unload any of these modules):

amdgpu               3461120  21
chash                  16384  1 amdgpu
gpu_sched              28672  1 amdgpu
i2c_algo_bit           16384  1 amdgpu
ttm                   126976  1 amdgpu
drm_kms_helper        208896  1 amdgpu
drm                   495616  25 gpu_sched,drm_kms_helper,amdgpu,ttm
mfd_core               16384  1 amdgpu

If I try to just run the VM with virsh start I just get a black screen.

I’m starting to wonder if my virtual consoles are just not unloading properly? Again here’s the script which does include commands needed to unbind the vconsoles:

#!/bin/bash
# Helpful to read output when debugging
set -x

# Stop display manager
systemctl stop display-manager.service
pkill gdm-wayland-session

# Unbind VTconsoles
echo 0 > /sys/class/vtconsole/vtcon0/bind

# Unbind EFI-Framebuffer
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind

sleep 5

# Unload AMD drivers
modprobe -r amdgpu

# Unbind the GPU from display driver
virsh nodedev-detach pci_0000_0b_00_0
virsh nodedev-detach pci_0000_0b_00_1

# Load VFIO kernel module
modprobe vfio-pci

I’ve installed the vendor-reset DKMS as well to appease the VFIO gods. Just in case. But otherwise I’m lost on this.

Same card, same problem. Im trying to solve this since days and cant get it to work at all. When I run my script over SSH from my phone, which btw looks similar to yours, just doenst have the unbind features for the console, I see that my Vega and its HDMI thingy are successfully detached, and that it started my “Win10 domain”, but only thing I get is a black screen.

1 Like

Have you tried unbinding the audio driver as well?

Yes, that’s virsh nodedev-detach pci_0000_0b_00_1 in the startup hook script. Neither virsh nodedev-detach pci_0000_0b_00_0 or virsh nodedev-detach pci_0000_0b_00_1 actually run properly, even if I manually run them (after running the hook script) as individual commands they just hang until I CTRL+C.

I think the problem is that those detach commands will not run until the amdgpu module is unloaded.

If I run the complete script as sudo manually, my screens completely go black but I still can’t unload the amdgpu module.

Here’s the relevant lspci output:

0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3)
0b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]

Have you started the pc, but chosen not to log in to gnome, switched to a text session (like Ctl+Alt+F3 or whatever) then closed the gui?

There is other ways to unbind and bind the pci devices tat might be worth a shot.

For testing, you could try killing gnome+gdm and stuff, logging in with ssh, then with sudo, try a couple of the echo commands, but using your devices addresses?

Can’t believe I didn’t think of that. Yeah exiting out of the display manager and ending it with systemctl stop display-manager.service killed of all the GNOME processes. So we can rule that out of the equation.

I’ll have a go with that script you mention substituting the id’s with my own.

I did just notice there are a bunch of PCI entries other than just the GPU and it’s audio output. Do I need to unbind all of these? Apologies this is my first VFIO attempt. Not sure if these are just from the motherboard or not.

user@userpc:~$ lspci | grep -e 'PCI\|\VGA\|Vega'
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:03.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:07.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:07.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-1fh) PCIe Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Internal PCIe GPP Bridge 0 to Bus B
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57ad
03:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a3
03:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a3
03:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a4
03:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a4
03:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 57a4
09:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1470 (rev c3)
0a:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Device 1471
0b:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3)
0b:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
0c:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Raven/Raven2 PCIe Dummy Function
0d:00.0 Non-Essential Instrumentation [1300]: Advanced Micro Devices, Inc. [AMD] Zeppelin/Renoir PCIe Dummy Function

Edit: fixed my lspci output.

You’d only need to unbind the GPU’s devices, and I guess anything else in the same IOMMU group?

Not sure if this is right, especially since it segfaults, but does switch off my monitors as normal. I’m unsure if I’m suppose change “10de 100a” or “10de 0e1a” to something else since searching for either comes up with forum issues with 750ti’s.

#!/bin/bash

# VGA Controller
echo '0000:0b:00.0' > /sys/bus/pci/devices/0000:0b:00.0/driver/unbind
echo '10de 100a'    > /sys/bus/pci/drivers/vfio-pci/new_id
echo '0000:0b:00.0' > /sys/bus/pci/devices/0000:0b:00.0/driver/bind
echo '10de 100a'    > /sys/bus/pci/drivers/vfio-pci/remove_id

# Audio Controller
echo '0000:0b:00.1' > /sys/bus/pci/devices/0000:0b:00.1/driver/unbind
echo '10de 0e1a'    > /sys/bus/pci/drivers/vfio-pci/new_id
echo '0000:0b:00.1' > /sys/bus/pci/devices/0000:0b:00.1/driver/bind
echo '10de 0e1a'    > /sys/bus/pci/drivers/vfio-pci/remove_id

Interestingly if I startup my VM with virsh start win10 after this and then try to modprobe -r amdgpu results in:

modprobe: ERROR: ../libkmod/libkmod-module.c:793 kmod_module_remove_module() could not remove 'amdgpu': Device or resource busy

So it seems like something is happening. Although virsh start win10 does hang still and there doesn’t seem to be a great deal of disk activity.

Nevermind, it’s the device ID and vendor ID. This should be right now but still segfaulting:

#!/bin/bash

# VGA Controller
echo '0000:0b:00.0' > /sys/bus/pci/devices/0000:0b:00.0/driver/unbind
echo '1002 687F'    > /sys/bus/pci/drivers/vfio-pci/new_id
echo '0000:0b:00.0' > /sys/bus/pci/devices/0000:0b:00.0/driver/bind
echo '1002 687F'    > /sys/bus/pci/drivers/vfio-pci/remove_id

# Audio Controller
echo '0000:0b:00.1' > /sys/bus/pci/devices/0000:0b:00.1/driver/unbind
echo '1002 aaf8'    > /sys/bus/pci/drivers/vfio-pci/new_id
echo '0000:0b:00.1' > /sys/bus/pci/devices/0000:0b:00.1/driver/bind
echo '1002 aaf8'    > /sys/bus/pci/drivers/vfio-pci/remove_id

Edit: I need some sleep. Forgot to change the device ID for the HDMI audio.

1 Like

Yeah, you’d have to replace the hardware ID’s with the ones from your card.

lspci -nnk

should give the hardware ID with a colon in square braces:

09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX 980 Ti] [10de:17c8] (rev a1)
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

09:00.1 Audio device [0403]: NVIDIA Corporation GM200 High Definition Audio [10de:0fb0] (rev a1)
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel

The highlighted ones are the Gpu and audio id’s for me.

I find it a lot easier to use a $20 used GPU as a “main” GPU and to isolate the pass through one. I had issues with a Rom Bar stopping the pass through for me headless, and it was a pain getting it off the card.

Have you tried isolating the pci id’s in grub, running the machine completely headless, to see if it would work that way?

Not yet, what’s the grub option and I’ll give it a shot via SSH?

on the same line in the grub config, add your hardware id’s, so it;s sometgin like:

GRUB_CMDLINE_LINUX=" iommu=1 amd_iommu=on rd.driver.pre=vfio-pci vfio-pci.ids=10de:17c8,10de:0fb0"

Like this?

GRUB_CMDLINE_LINUX="iommu=1 amd_iommu=on iommu=pt rd.driver.pre=vfio-pci vfio-pci.ids=1002:687F,1002:aaf8"

Oddly my GPU is still being used as normal even after grub-update. These are 100% the right device ID’s. See: PCI\VEN_1002&DEV_687F - Vega 10 XL/XT [Radeon RX Vega… | Device Hunt

That looks right, and on my system it reserves them for vfio, with driver in use: vfio.

Not sure what else to try.

Might it be something to do with the efib?
Any mention of the he id’s or the pci address in dmesg to say what is working or not working on boot time?

I’m now 100% convinced my GPU is cursed.

Here’s some dmesg output, not sure if it’s what your looking for:

user@userpc:~$ sudo dmesg | grep 1002
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.19.0-13-amd64 root=/dev/mapper/userpc-lv_root ro iommu=1 amd_iommu=on iommu=pt rd.driver.pre=vfio-pci vfio-pci.ids=1002:687F,1002:aaf8 quiet loglevel=3 fsck.mode=auto
[    0.172355] Kernel command line: BOOT_IMAGE=/vmlinuz-4.19.0-13-amd64 root=/dev/mapper/userpc-lv_root ro iommu=1 amd_iommu=on iommu=pt rd.driver.pre=vfio-pci vfio-pci.ids=1002:687F,1002:aaf8 quiet loglevel=3 fsck.mode=auto
[    0.503827] pci 0000:0b:00.0: [1002:687f] type 00 class 0x030000
[    0.504008] pci 0000:0b:00.1: [1002:aaf8] type 00 class 0x040300
[    1.847255] [drm] initializing kernel modesetting (VEGA10 0x1002:0x687F 0x1002:0x0B36 0xC3).
[    2.433841] Topology: Add dGPU node [0x687f:0x1002]
[    2.433903] kfd kfd: added device 1002:687f
user@userpc:~$ sudo dmesg | grep 0000:0b:00
[    0.503827] pci 0000:0b:00.0: [1002:687f] type 00 class 0x030000
[    0.503850] pci 0000:0b:00.0: reg 0x10: [mem 0xd0000000-0xdfffffff 64bit pref]
[    0.503859] pci 0000:0b:00.0: reg 0x18: [mem 0xe0000000-0xe01fffff 64bit pref]
[    0.503865] pci 0000:0b:00.0: reg 0x20: [io  0xf000-0xf0ff]
[    0.503871] pci 0000:0b:00.0: reg 0x24: [mem 0xfcc00000-0xfcc7ffff]
[    0.503878] pci 0000:0b:00.0: reg 0x30: [mem 0xfcc80000-0xfcc9ffff pref]
[    0.503890] pci 0000:0b:00.0: BAR 0: assigned to efifb
[    0.503935] pci 0000:0b:00.0: PME# supported from D1 D2 D3hot D3cold
[    0.504008] pci 0000:0b:00.1: [1002:aaf8] type 00 class 0x040300
[    0.504024] pci 0000:0b:00.1: reg 0x10: [mem 0xfcca0000-0xfcca3fff]
[    0.504099] pci 0000:0b:00.1: PME# supported from D1 D2 D3hot D3cold
[    0.508847] pci 0000:0b:00.0: vgaarb: setting as boot VGA device
[    0.508847] pci 0000:0b:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.508847] pci 0000:0b:00.0: vgaarb: bridge control possible
[    0.535543] pci 0000:0b:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    0.535550] pci 0000:0b:00.1: Linked as a consumer to 0000:0b:00.0
[    0.535551] pci 0000:0b:00.1: D0 power state depends on 0000:0b:00.0
[    1.152550] iommu: Adding device 0000:0b:00.0 to group 24
[    1.152610] iommu: Using direct mapping for device 0000:0b:00.0
[    1.152695] iommu: Adding device 0000:0b:00.1 to group 25
[    1.152712] iommu: Using direct mapping for device 0000:0b:00.1
[    1.847293] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_gpu_info.bin
[    1.847317] amdgpu 0000:0b:00.0: No more image in the PCI ROM
[    1.847353] amdgpu 0000:0b:00.0: VRAM: 8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used)
[    1.847354] amdgpu 0000:0b:00.0: GART: 512M 0x000000F600000000 - 0x000000F61FFFFFFF
[    1.847657] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_sos.bin
[    1.847671] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_asd.bin
[    1.847706] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_acg_smc.bin
[    1.847726] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_pfp.bin
[    1.847736] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_me.bin
[    1.847746] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_ce.bin
[    1.847755] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_rlc.bin
[    1.847793] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_mec.bin
[    1.847830] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_mec2.bin
[    1.848455] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_sdma.bin
[    1.848467] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_sdma1.bin
[    1.848578] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_uvd.bin
[    1.849121] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_vce.bin
[    2.528033] amdgpu 0000:0b:00.0: fb0: amdgpudrmfb frame buffer device
[    2.540371] amdgpu 0000:0b:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
[    2.540372] amdgpu 0000:0b:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0
[    2.540373] amdgpu 0000:0b:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0
[    2.540374] amdgpu 0000:0b:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0
[    2.540374] amdgpu 0000:0b:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0
[    2.540375] amdgpu 0000:0b:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0
[    2.540376] amdgpu 0000:0b:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0
[    2.540377] amdgpu 0000:0b:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0
[    2.540377] amdgpu 0000:0b:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0
[    2.540378] amdgpu 0000:0b:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0
[    2.540379] amdgpu 0000:0b:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
[    2.540380] amdgpu 0000:0b:00.0: ring 11(sdma1) uses VM inv eng 5 on hub 1
[    2.540380] amdgpu 0000:0b:00.0: ring 12(uvd<0>) uses VM inv eng 6 on hub 1
[    2.540381] amdgpu 0000:0b:00.0: ring 13(uvd_enc0<0>) uses VM inv eng 7 on hub 1
[    2.540382] amdgpu 0000:0b:00.0: ring 14(uvd_enc1<0>) uses VM inv eng 8 on hub 1
[    2.540382] amdgpu 0000:0b:00.0: ring 15(vce0) uses VM inv eng 9 on hub 1
[    2.540383] amdgpu 0000:0b:00.0: ring 16(vce1) uses VM inv eng 10 on hub 1
[    2.540384] amdgpu 0000:0b:00.0: ring 17(vce2) uses VM inv eng 11 on hub 1
[    2.541241] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:0b:00.0 on minor 0
[   15.243902] snd_hda_intel 0000:0b:00.1: Handle vga_switcheroo audio client
[   15.283308] input: HD-Audio Generic HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input12
[   15.283391] input: HD-Audio Generic HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input13
[   15.283483] input: HD-Audio Generic HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input14
[   15.283538] input: HD-Audio Generic HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input15
[   15.283643] input: HD-Audio Generic HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input16
[   15.283746] input: HD-Audio Generic HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input17

You might try the efib=off or whatever in the grub command line?

video=efifb:off

From here:

Nope. It’s still starting as normal.

Grub config:

GRUB_CMDLINE_LINUX="iommu=1 amd_iommu=on iommu=pt rd.driver.pre=vfio-pci vfio-pci.ids=1002:687F,1002:aaf8 video=efifb:off"

Here’s the dmesg logs again:

user@userpc:~$ sudo dmesg | grep 1002
[sudo] password for user: 
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.19.0-13-amd64 root=/dev/mapper/userpc-lv_root ro iommu=1 amd_iommu=on iommu=pt rd.driver.pre=vfio-pci vfio-pci.ids=1002:687F,1002:aaf8 video=efifb:off quiet loglevel=3 fsck.mode=auto
[    0.169958] Kernel command line: BOOT_IMAGE=/vmlinuz-4.19.0-13-amd64 root=/dev/mapper/userpc-lv_root ro iommu=1 amd_iommu=on iommu=pt rd.driver.pre=vfio-pci vfio-pci.ids=1002:687F,1002:aaf8 video=efifb:off quiet loglevel=3 fsck.mode=auto
[    0.505647] pci 0000:0b:00.0: [1002:687f] type 00 class 0x030000
[    0.505829] pci 0000:0b:00.1: [1002:aaf8] type 00 class 0x040300
[    1.851449] [drm] initializing kernel modesetting (VEGA10 0x1002:0x687F 0x1002:0x0B36 0xC3).
[    2.439143] Topology: Add dGPU node [0x687f:0x1002]
[    2.439206] kfd kfd: added device 1002:687f
user@userpc:~$ sudo dmesg | grep 0000:0b:00
[    0.505647] pci 0000:0b:00.0: [1002:687f] type 00 class 0x030000
[    0.505671] pci 0000:0b:00.0: reg 0x10: [mem 0xd0000000-0xdfffffff 64bit pref]
[    0.505680] pci 0000:0b:00.0: reg 0x18: [mem 0xe0000000-0xe01fffff 64bit pref]
[    0.505686] pci 0000:0b:00.0: reg 0x20: [io  0xf000-0xf0ff]
[    0.505692] pci 0000:0b:00.0: reg 0x24: [mem 0xfcc00000-0xfcc7ffff]
[    0.505698] pci 0000:0b:00.0: reg 0x30: [mem 0xfcc80000-0xfcc9ffff pref]
[    0.505711] pci 0000:0b:00.0: BAR 0: assigned to efifb
[    0.505757] pci 0000:0b:00.0: PME# supported from D1 D2 D3hot D3cold
[    0.505829] pci 0000:0b:00.1: [1002:aaf8] type 00 class 0x040300
[    0.505845] pci 0000:0b:00.1: reg 0x10: [mem 0xfcca0000-0xfcca3fff]
[    0.505920] pci 0000:0b:00.1: PME# supported from D1 D2 D3hot D3cold
[    0.510211] pci 0000:0b:00.0: vgaarb: setting as boot VGA device
[    0.510211] pci 0000:0b:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.510211] pci 0000:0b:00.0: vgaarb: bridge control possible
[    0.536690] pci 0000:0b:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    0.536697] pci 0000:0b:00.1: Linked as a consumer to 0000:0b:00.0
[    0.536697] pci 0000:0b:00.1: D0 power state depends on 0000:0b:00.0
[    1.149254] iommu: Adding device 0000:0b:00.0 to group 24
[    1.149315] iommu: Using direct mapping for device 0000:0b:00.0
[    1.149398] iommu: Adding device 0000:0b:00.1 to group 25
[    1.149415] iommu: Using direct mapping for device 0000:0b:00.1
[    1.851489] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_gpu_info.bin
[    1.851514] amdgpu 0000:0b:00.0: No more image in the PCI ROM
[    1.851549] amdgpu 0000:0b:00.0: VRAM: 8176M 0x000000F400000000 - 0x000000F5FEFFFFFF (8176M used)
[    1.851550] amdgpu 0000:0b:00.0: GART: 512M 0x000000F600000000 - 0x000000F61FFFFFFF
[    1.851838] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_sos.bin
[    1.851852] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_asd.bin
[    1.851888] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_acg_smc.bin
[    1.851907] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_pfp.bin
[    1.851918] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_me.bin
[    1.851926] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_ce.bin
[    1.851935] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_rlc.bin
[    1.851971] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_mec.bin
[    1.852006] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_mec2.bin
[    1.852606] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_sdma.bin
[    1.852617] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_sdma1.bin
[    1.852724] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_uvd.bin
[    1.853260] amdgpu 0000:0b:00.0: firmware: direct-loading firmware amdgpu/vega10_vce.bin
[    2.533308] amdgpu 0000:0b:00.0: fb0: amdgpudrmfb frame buffer device
[    2.545680] amdgpu 0000:0b:00.0: ring 0(gfx) uses VM inv eng 4 on hub 0
[    2.545681] amdgpu 0000:0b:00.0: ring 1(comp_1.0.0) uses VM inv eng 5 on hub 0
[    2.545682] amdgpu 0000:0b:00.0: ring 2(comp_1.1.0) uses VM inv eng 6 on hub 0
[    2.545683] amdgpu 0000:0b:00.0: ring 3(comp_1.2.0) uses VM inv eng 7 on hub 0
[    2.545683] amdgpu 0000:0b:00.0: ring 4(comp_1.3.0) uses VM inv eng 8 on hub 0
[    2.545684] amdgpu 0000:0b:00.0: ring 5(comp_1.0.1) uses VM inv eng 9 on hub 0
[    2.545685] amdgpu 0000:0b:00.0: ring 6(comp_1.1.1) uses VM inv eng 10 on hub 0
[    2.545685] amdgpu 0000:0b:00.0: ring 7(comp_1.2.1) uses VM inv eng 11 on hub 0
[    2.545686] amdgpu 0000:0b:00.0: ring 8(comp_1.3.1) uses VM inv eng 12 on hub 0
[    2.545687] amdgpu 0000:0b:00.0: ring 9(kiq_2.1.0) uses VM inv eng 13 on hub 0
[    2.545688] amdgpu 0000:0b:00.0: ring 10(sdma0) uses VM inv eng 4 on hub 1
[    2.545688] amdgpu 0000:0b:00.0: ring 11(sdma1) uses VM inv eng 5 on hub 1
[    2.545689] amdgpu 0000:0b:00.0: ring 12(uvd<0>) uses VM inv eng 6 on hub 1
[    2.545690] amdgpu 0000:0b:00.0: ring 13(uvd_enc0<0>) uses VM inv eng 7 on hub 1
[    2.545690] amdgpu 0000:0b:00.0: ring 14(uvd_enc1<0>) uses VM inv eng 8 on hub 1
[    2.545691] amdgpu 0000:0b:00.0: ring 15(vce0) uses VM inv eng 9 on hub 1
[    2.545692] amdgpu 0000:0b:00.0: ring 16(vce1) uses VM inv eng 10 on hub 1
[    2.545693] amdgpu 0000:0b:00.0: ring 17(vce2) uses VM inv eng 11 on hub 1
[    2.546554] [drm] Initialized amdgpu 3.27.0 20150101 for 0000:0b:00.0 on minor 0
[   14.013683] snd_hda_intel 0000:0b:00.1: Handle vga_switcheroo audio client
[   14.026813] input: HD-Audio Generic HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input12
[   14.026871] input: HD-Audio Generic HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input13
[   14.026953] input: HD-Audio Generic HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input14
[   14.027029] input: HD-Audio Generic HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input15
[   14.027111] input: HD-Audio Generic HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input16
[   14.027167] input: HD-Audio Generic HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:03.1/0000:09:00.0/0000:0a:00.0/0000:0b:00.1/sound/card0/input17

I haven’t had time to read thru the whole post and I have to take off soon so hopefully this isn’t pointing you in the wrong direction but the last “echo” command for both the VGA and Audio devices is unloading the VFIO-pci driver; I would expect that in this case either the normal driver, e.g. nouveau or nvidia or whatever, is then loaded or the device will have no driver at all. I would remove the last “echo” line for both devices that is writing to the /sys/but/pci/drivers/vfio-pci/remove_id file and see if that works. If it doesn’t, I’d also remove the /sys/bus/pci/drivers/vfio-pci/new_id command too. I’m new to this, but from my research I think you only need to use the pci bus address OR the device id, not both. so:
echo ‘0000:0b:00.0’ > /sys/bus/pci/devices/0000:0b:00.0/driver/unbind – unloads the current loaded driver like nvida; thenecho ‘0000:0b:00.0’ > /sys/bus/pci/devices/0000:0b:00.0/driver/bind --loads the VFIO-pci driver. this is all that I think is needed

One other thing of note in general (i.e. doesn’t apply to this specific case) you can also use “virsh” commands to accomplish the same, but I’ve been having problems where the first time I use the “virsh” command it cannot find the device and fails, for some reason the second or third time it is able to locate the device and works. So if you use “virsh” commands in a script, do some checking to see if it succeeded and if not try again. Use a loop or something to try 5 times is my suggestion.