Trouble getting second Vega 56 to load VFIO driver

Hello, I don’t often post on forums for help,but I’ll try my best to not make this a nonsensical mess.

I’m having trouble getting my second Vega 56 card to load the vfio driver on my 2700 + X470 system. Following the Arch Wiki, I’ve enable IOMMU successfully:

dmesg | grep -i -e DMAR -e IOMMU
[ 0.000000] Command line: initrd=\amd-ucode.img initrd=\initramfs-linux.img root=UUID=4a16fc36-a250-431b-86c0-2cbee3564f67 amd_iommu=on iommu=pt rw
[ 0.000000] Kernel command line: initrd=\amd-ucode.img initrd=\initramfs-linux.img root=UUID=4a16fc36-a250-431b-86c0-2cbee3564f67 amd_iommu=on iommu=pt rw
[ 0.826673] iommu: Default domain type: Passthrough (set via kernel command line)
[ 0.963506] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[ 0.963579] pci 0000:00:01.0: Adding to iommu group 0
[ 0.963598] pci 0000:00:01.1: Adding to iommu group 1
[ 0.963618] pci 0000:00:01.3: Adding to iommu group 2
[ 0.963635] pci 0000:00:02.0: Adding to iommu group 3
[ 0.963659] pci 0000:00:03.0: Adding to iommu group 4
[ 0.963675] pci 0000:00:03.1: Adding to iommu group 5
[ 0.963694] pci 0000:00:03.2: Adding to iommu group 6
[ 0.963711] pci 0000:00:04.0: Adding to iommu group 7
[ 0.963731] pci 0000:00:07.0: Adding to iommu group 8
[ 0.963746] pci 0000:00:07.1: Adding to iommu group 9
[ 0.963768] pci 0000:00:08.0: Adding to iommu group 10
[ 0.963785] pci 0000:00:08.1: Adding to iommu group 11
[ 0.963807] pci 0000:00:14.0: Adding to iommu group 12
[ 0.963820] pci 0000:00:14.3: Adding to iommu group 12
[ 0.963869] pci 0000:00:18.0: Adding to iommu group 13
[ 0.963884] pci 0000:00:18.1: Adding to iommu group 13
[ 0.963896] pci 0000:00:18.2: Adding to iommu group 13
[ 0.963909] pci 0000:00:18.3: Adding to iommu group 13
[ 0.963923] pci 0000:00:18.4: Adding to iommu group 13
[ 0.963937] pci 0000:00:18.5: Adding to iommu group 13
[ 0.963950] pci 0000:00:18.6: Adding to iommu group 13
[ 0.963962] pci 0000:00:18.7: Adding to iommu group 13
[ 0.963990] pci 0000:01:00.0: Adding to iommu group 14
[ 0.964026] pci 0000:02:00.0: Adding to iommu group 15
[ 0.964046] pci 0000:02:00.1: Adding to iommu group 15
[ 0.964067] pci 0000:02:00.2: Adding to iommu group 15
[ 0.964077] pci 0000:03:00.0: Adding to iommu group 15
[ 0.964087] pci 0000:03:04.0: Adding to iommu group 15
[ 0.964097] pci 0000:03:06.0: Adding to iommu group 15
[ 0.964108] pci 0000:03:07.0: Adding to iommu group 15
[ 0.964118] pci 0000:03:09.0: Adding to iommu group 15
[ 0.964132] pci 0000:05:00.0: Adding to iommu group 15
[ 0.964148] pci 0000:07:00.0: Adding to iommu group 15
[ 0.964172] pci 0000:08:00.0: Adding to iommu group 15
[ 0.964190] pci 0000:09:00.0: Adding to iommu group 16
[ 0.964208] pci 0000:0a:00.0: Adding to iommu group 17
[ 0.964273] pci 0000:0b:00.0: Adding to iommu group 18
[ 0.964297] pci 0000:0b:00.1: Adding to iommu group 19
[ 0.964316] pci 0000:0c:00.0: Adding to iommu group 20
[ 0.964335] pci 0000:0d:00.0: Adding to iommu group 21
[ 0.964401] pci 0000:0e:00.0: Adding to iommu group 22
[ 0.964425] pci 0000:0e:00.1: Adding to iommu group 23
[ 0.964443] pci 0000:0f:00.0: Adding to iommu group 24
[ 0.964463] pci 0000:0f:00.2: Adding to iommu group 25
[ 0.964480] pci 0000:0f:00.3: Adding to iommu group 26
[ 0.964499] pci 0000:10:00.0: Adding to iommu group 27
[ 0.964522] pci 0000:10:00.2: Adding to iommu group 28
[ 0.964542] pci 0000:10:00.3: Adding to iommu group 29
[ 0.964791] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[ 0.965218] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[ 1.003413] AMD-Vi: AMD IOMMUv2 driver by Joerg Roedel [email protected]

Here are the cards:

lspci -nnk -d 1002:687f
0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c3)
Subsystem: Micro-Star International Co., Ltd. [MSI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1462:3681]
Kernel driver in use: amdgpu
Kernel modules: amdgpu
0e:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c3)
Subsystem: XFX Pine Group Inc. Vega 10 XL/XT [Radeon RX Vega 56/64] [1682:9c30]
Kernel driver in use: amdgpu
Kernel modules: amdgpu

And audio devices:

lspci -nnk -d 1002:aaf8
0b:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] [1002:aaf8]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] [1002:aaf8]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
0e:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] [1002:aaf8]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] [1002:aaf8]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

Then tried both scripts for identical gpu’s, regenerating the initramfs every change, trying to pass card in group 0e:00.0, but both fail resulting in the same lspci results above:

#!/bin/sh
for i in /sys/bus/pci/devices/*/boot_vga; do
if [ $(cat “i") -eq 0 ]; then GPU="{i%/boot_vga}”
AUDIO="$(echo “GPU" | sed -e "s/0/1/”)"
echo “vfio-pci” > “$GPU/driver_override”
if [ -d “$AUDIO” ]; then
echo “vfio-pci” > “$AUDIO/driver_override”
fi
fi
done
modprobe -i vfio-pci

#!/bin/sh
DEVS=“0000:0e:00.0 0000:0e:00.1”
if [ ! -z “$(ls -A /sys/class/iommu)” ]; then
for DEV in $DEVS; do
echo “vfio-pci” > /sys/bus/pci/devices/$DEV/driver_override
done
fi

I was able to get the second card to not load the amdgpu driver by copying THIS POST’S settings, but it still didn’t load the vfio driver, looked like this:

lspci -nnk -d 1002:687f
0b:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c3)
Subsystem: Micro-Star International Co., Ltd. [MSI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1462:3681]
Kernel driver in use: amdgpu
Kernel modules: amdgpu
0e:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c3)
Subsystem: XFX Pine Group Inc. Vega 10 XL/XT [Radeon RX Vega 56/64] [1682:9c30]
Kernel modules: amdgpu

While rebooting this comes up in post, which I suspect might be the problem, but I’m not sure:

Any and all help is appreciated, sorry if it’s hard to follow. Also just returning the card is still an options, got it off eBay, arrived today, seems to work fine, but would feel like a dick returning it.

Also random question, does the amdgpu driver take advantage of multiple cards? It seems like there’s a performance boost in games, and the second card heats up…

Did you run:

sudo chmod 755 /usr/sbin/vfio-pci-override.sh
sudo chown root:root /usr/sbin/vfio-pci-override.sh

Though you might need to change directory (Can someone confirm?)

and

Can you post output of

sudo lsinitrd | grep vfio