[SOLVED] Unable to isolate GPU for VFIO (workaround)

maybe its a manjaro story again ?
this distro breaks so many things i only am using it for around 6 month now and my history of things breaking is longer as even with windows


also for manjaro there is a tutorial with a script wich i used on my first am5 install:

maybe just use another distro and or version.

I promise you it works.
As you can see in my screenshot even with a manjaro etc (just not a fresh install ofc)

Dont give up (:
try again in a few days maybe on ubuntu or even fedora ?

PS: maybe early loading vfio might help?
And depending on your GPU (if it resets fine) now days you dont need to bind vfio on boot.
you can us virsh tools to rebind drivers while running (which is what i will do with the new system)

1 Like

This script is outdated as well or the person who published it does not know what they are doing. Can you people please stop posting stuff you don’t understand. This Manjaro helper script you posted uses amd_iommu parameter as well, even tough it is deprecated. It also uses the rd.* kernel parameter, which are used when using dracut as a boot manager, but Manjaro uses GRUB. Dracut is not even installed on a standard Manjaro installation. They don’t work on GRUB. This person seems not to know what they are doing. Even more so, this entire script is used when passing through the same GPU model as the host uses. It does a lot of unneeded, possible harmful stuff. Manjaro is Arch with slight variations and I am still convinced you only need to follow and understand the Arch Wiki, nothing more.

1 Like

it worked when i first used it ^ ^ (thats why i suggested trying an old image)
on my current install i just followed the tutorial from heikos blog.
Its not for manjaro nor is it up to date.

but setting grub options binging the pci to vfio and setting some bios options is not that much.

if i ran into trouble the tutorial was in depth enough to figure it out.
my conclusion was just to it step by step and dont start from the back :smiley:

Although I am 99% positive this is a mobo/bios bug, I will do another try, as I already returned a mobo and don’t want to return another one


Based on Arch-wiki:

  1. Enabled IOMMU and SVM in BIOS
    2.1 Adding Kernel parameters
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash resume=UUID=70ee4e13-e50a-4dc8-9c7a-465e0f472998 hugepages=16384 iommu=pt"

update-grub (log)

Generating grub configuration file ...
Found theme: /usr/share/grub/themes/manjaro/theme.txt
Found linux image: /boot/vmlinuz-6.2-x86_64
Found initrd image: /boot/amd-ucode.img /boot/initramfs-6.2-x86_64.img
Found initrd fallback image: /boot/initramfs-6.2-x86_64-fallback.img
Found linux image: /boot/vmlinuz-6.1-x86_64
Found initrd image: /boot/amd-ucode.img /boot/initramfs-6.1-x86_64.img
Found initrd fallback image: /boot/initramfs-6.1-x86_64-fallback.img
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
Root filesystem isn't btrfs
If you think an error has occurred, please file a bug report at "https://github.com/Antynea/grub-btrfs"
Found memtest86+ image: /boot/memtest86+/memtest.bin
/usr/bin/grub-probe: warning: unknown device type nvme2n1.
Found memtest86+ EFI image: /boot/memtest86+/memtest.efi
/usr/bin/grub-probe: warning: unknown device type nvme2n1.
done

reboot (at this point, splash sceen is on dGPU, so is the login screen - since I don’t have monitor connected there, I have to blind login)
check:

dmesg | grep -i -e DMAR -e IOMMU

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.2-x86_64 root=UUID=e3624de6-54ff-4ca9-b2da-fd258cd2f759 rw quiet splash resume=UUID=70ee4e13-e50a-4dc8-9c7a-465e0f472998 hugepages=16384 iommu=pt
[    0.047185] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.2-x86_64 root=UUID=e3624de6-54ff-4ca9-b2da-fd258cd2f759 rw quiet splash resume=UUID=70ee4e13-e50a-4dc8-9c7a-465e0f472998 hugepages=16384 iommu=pt
[    1.668553] iommu: Default domain type: Passthrough (set via kernel command line)
[    1.691504] pci 0000:00:00.2: AMD-Vi: IOMMU performance counters supported
[    1.691520] pci 0000:00:01.0: Adding to iommu group 0
[    1.691524] pci 0000:00:01.1: Adding to iommu group 1
[    1.691529] pci 0000:00:01.2: Adding to iommu group 2
[    1.691535] pci 0000:00:02.0: Adding to iommu group 3
[    1.691539] pci 0000:00:02.1: Adding to iommu group 4
[    1.691543] pci 0000:00:02.2: Adding to iommu group 5
[    1.691550] pci 0000:00:03.0: Adding to iommu group 6
[    1.691554] pci 0000:00:04.0: Adding to iommu group 7
[    1.691560] pci 0000:00:08.0: Adding to iommu group 8
[    1.691564] pci 0000:00:08.1: Adding to iommu group 9
[    1.691567] pci 0000:00:08.3: Adding to iommu group 10
[    1.691573] pci 0000:00:14.0: Adding to iommu group 11
[    1.691576] pci 0000:00:14.3: Adding to iommu group 11
[    1.691591] pci 0000:00:18.0: Adding to iommu group 12
[    1.691595] pci 0000:00:18.1: Adding to iommu group 12
[    1.691598] pci 0000:00:18.2: Adding to iommu group 12
[    1.691601] pci 0000:00:18.3: Adding to iommu group 12
[    1.691603] pci 0000:00:18.4: Adding to iommu group 12
[    1.691606] pci 0000:00:18.5: Adding to iommu group 12
[    1.691609] pci 0000:00:18.6: Adding to iommu group 12
[    1.691613] pci 0000:00:18.7: Adding to iommu group 12
[    1.691617] pci 0000:01:00.0: Adding to iommu group 13
[    1.691621] pci 0000:02:00.0: Adding to iommu group 14
[    1.691628] pci 0000:03:00.0: Adding to iommu group 15
[    1.691636] pci 0000:03:00.1: Adding to iommu group 16
[    1.691642] pci 0000:03:00.2: Adding to iommu group 17
[    1.691648] pci 0000:03:00.3: Adding to iommu group 18
[    1.691652] pci 0000:04:00.0: Adding to iommu group 19
[    1.691656] pci 0000:05:00.0: Adding to iommu group 20
[    1.691660] pci 0000:06:00.0: Adding to iommu group 21
[    1.691664] pci 0000:06:04.0: Adding to iommu group 22
[    1.691668] pci 0000:06:05.0: Adding to iommu group 23
[    1.691673] pci 0000:06:06.0: Adding to iommu group 24
[    1.691677] pci 0000:06:08.0: Adding to iommu group 25
[    1.691680] pci 0000:06:0c.0: Adding to iommu group 26
[    1.691684] pci 0000:06:0d.0: Adding to iommu group 27
[    1.691685] pci 0000:0b:00.0: Adding to iommu group 25
[    1.691687] pci 0000:0c:00.0: Adding to iommu group 25
[    1.691688] pci 0000:0c:04.0: Adding to iommu group 25
[    1.691690] pci 0000:0c:06.0: Adding to iommu group 25
[    1.691691] pci 0000:0c:07.0: Adding to iommu group 25
[    1.691693] pci 0000:0c:08.0: Adding to iommu group 25
[    1.691694] pci 0000:0c:0c.0: Adding to iommu group 25
[    1.691695] pci 0000:0c:0d.0: Adding to iommu group 25
[    1.691696] pci 0000:0e:00.0: Adding to iommu group 25
[    1.691698] pci 0000:0f:00.0: Adding to iommu group 25
[    1.691700] pci 0000:10:00.0: Adding to iommu group 25
[    1.691701] pci 0000:11:00.0: Adding to iommu group 25
[    1.691702] pci 0000:12:00.0: Adding to iommu group 25
[    1.691704] pci 0000:13:00.0: Adding to iommu group 25
[    1.691705] pci 0000:14:00.0: Adding to iommu group 26
[    1.691706] pci 0000:15:00.0: Adding to iommu group 27
[    1.691711] pci 0000:16:00.0: Adding to iommu group 28
[    1.691722] pci 0000:17:00.0: Adding to iommu group 29
[    1.691727] pci 0000:17:00.1: Adding to iommu group 30
[    1.691731] pci 0000:17:00.2: Adding to iommu group 31
[    1.691735] pci 0000:17:00.3: Adding to iommu group 32
[    1.691740] pci 0000:17:00.4: Adding to iommu group 33
[    1.691744] pci 0000:17:00.6: Adding to iommu group 34
[    1.691747] pci 0000:18:00.0: Adding to iommu group 35
[    1.691980] pci 0000:00:00.2: AMD-Vi: Found IOMMU cap 0x40
[    1.784163] perf/amd_iommu: Detected AMD IOMMU #0 (2 banks, 4 counters/bank).
[    1.794607] AMD-Vi: AMD IOMMUv2 loaded and initialized

2.2 Ensuring that the groups are valid

IOMMU Group 0:
        00:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
IOMMU Group 1:
        00:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
IOMMU Group 10:
        00:08.3 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14dd]
IOMMU Group 11:
        00:14.0 SMBus [0c05]: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller [1022:790b] (rev 71)
        00:14.3 ISA bridge [0601]: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge [1022:790e] (rev 51)
IOMMU Group 12:
        00:18.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e0]
        00:18.1 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e1]
        00:18.2 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e2]
        00:18.3 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e3]
        00:18.4 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e4]
        00:18.5 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e5]
        00:18.6 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e6]
        00:18.7 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14e7]
IOMMU Group 13:
        01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c0)
IOMMU Group 14:
        02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU Group 15:
        03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c0)
IOMMU Group 16:
        03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
IOMMU Group 17:
        03:00.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:73a6]
IOMMU Group 18:
        03:00.3 Serial bus controller [0c80]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 USB [1002:73a4]
IOMMU Group 19:
        04:00.0 Non-Volatile memory controller [0108]: ADATA Technology Co., Ltd. XPG GAMMIX S50 NVMe SSD [1cc1:5350] (rev 03)
IOMMU Group 2:
        00:01.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
IOMMU Group 20:
        05:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f4] (rev 01)
IOMMU Group 21:
        06:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
IOMMU Group 22:
        06:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
IOMMU Group 23:
        06:05.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
IOMMU Group 24:
        06:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
IOMMU Group 25:
        06:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
        0b:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f4] (rev 01)
        0c:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
        0c:04.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
        0c:06.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
        0c:07.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
        0c:08.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
        0c:0c.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
        0c:0d.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
        0e:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8161] (rev 15)
        0f:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller I225-V [8086:15f3] (rev 01)
        10:00.0 Network controller [0280]: Intel Corporation Wi-Fi 6 AX210/AX211/AX411 160MHz [8086:2725] (rev 1a)
        11:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. Device [2646:5013] (rev 01)
        12:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f7] (rev 01)
        13:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f6] (rev 01)
IOMMU Group 26:
        06:0c.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
        14:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f7] (rev 01)
IOMMU Group 27:
        06:0d.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f5] (rev 01)
        15:00.0 SATA controller [0106]: Advanced Micro Devices, Inc. [AMD] Device [1022:43f6] (rev 01)
IOMMU Group 28:
        16:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
IOMMU Group 29:
        17:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev c9)
IOMMU Group 3:
        00:02.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
IOMMU Group 30:
        17:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt Radeon High Definition Audio Controller [1002:1640]
IOMMU Group 31:
        17:00.2 Encryption controller [1080]: Advanced Micro Devices, Inc. [AMD] VanGogh PSP/CCP [1022:1649]
IOMMU Group 32:
        17:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b6]
IOMMU Group 33:
        17:00.4 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b7]
IOMMU Group 34:
        17:00.6 Audio device [0403]: Advanced Micro Devices, Inc. [AMD] Family 17h/19h HD Audio Controller [1022:15e3]
IOMMU Group 35:
        18:00.0 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] Device [1022:15b8]
IOMMU Group 4:
        00:02.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
IOMMU Group 5:
        00:02.2 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14db]
IOMMU Group 6:
        00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
IOMMU Group 7:
        00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
IOMMU Group 8:
        00:08.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Device [1022:14da]
IOMMU Group 9:
        00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Device [1022:14dd]
  1. Binding vfio-pci via device ID
    dVGA groups:
IOMMU Group 13:
        01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch [1002:1478] (rev c0)
IOMMU Group 14:
        02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch [1002:1479]
IOMMU Group 15:
        03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c0)
IOMMU Group 16:
        03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
IOMMU Group 17:
        03:00.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:73a6]
IOMMU Group 18:
        03:00.3 Serial bus controller [0c80]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 USB [1002:73a4]

dVGA IDs:

1002:73bf,1002:ab28,1002:73a6,1002:73a4

adding the IDs into Kernel parameters:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash resume=UUID=70ee4e13-e50a-4dc8-9c7a-465e0f472998 hugepages=16384 iommu=pt vfio-pci.ids=1002:73bf,1002:ab28,1002:73a6,1002:73a4"

update-grub

Generating grub configuration file ...
Found theme: /usr/share/grub/themes/manjaro/theme.txt
Found linux image: /boot/vmlinuz-6.2-x86_64
Found initrd image: /boot/amd-ucode.img /boot/initramfs-6.2-x86_64.img
Found initrd fallback image: /boot/initramfs-6.2-x86_64-fallback.img
Found linux image: /boot/vmlinuz-6.1-x86_64
Found initrd image: /boot/amd-ucode.img /boot/initramfs-6.1-x86_64.img
Found initrd fallback image: /boot/initramfs-6.1-x86_64-fallback.img
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
Root filesystem isn't btrfs
If you think an error has occurred, please file a bug report at "https://github.com/Antynea/grub-btrfs"
Found memtest86+ image: /boot/memtest86+/memtest.bin
/usr/bin/grub-probe: warning: unknown device type nvme2n1.
Found memtest86+ EFI image: /boot/memtest86+/memtest.efi
/usr/bin/grub-probe: warning: unknown device type nvme2n1.
done

3.2.1 Loading vfio-pci early - mkinitcpio
before:

MODULES="crc32c"
HOOKS="base udev autodetect modconf block keyboard keymap consolefont plymouth resume filesystems"

after:

MODULES="vfio_pci vfio vfio_iommu_type1 crc32c"
HOOKS="base udev autodetect modconf block keyboard keymap consolefont plymouth resume filesystems"

mkinitcpio -P

==> Building image from preset: /etc/mkinitcpio.d/linux61.preset: 'default'
  -> -k /boot/vmlinuz-6.1-x86_64 -c /etc/mkinitcpio.conf -g /boot/initramfs-6.1-x86_64.img
==> Starting build: '6.1.25-1-MANJARO'
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [autodetect]
  -> Running build hook: [modconf]
  -> Running build hook: [block]
==> WARNING: Possibly missing firmware for module: 'xhci_pci'
  -> Running build hook: [keyboard]
  -> Running build hook: [keymap]
  -> Running build hook: [consolefont]
==> WARNING: consolefont: no font found in configuration
  -> Running build hook: [plymouth]
  -> Running build hook: [resume]
  -> Running build hook: [filesystems]
==> Generating module dependencies
==> Creating gzip-compressed initcpio image: '/boot/initramfs-6.1-x86_64.img'
==> Image generation successful
==> Building image from preset: /etc/mkinitcpio.d/linux61.preset: 'fallback'
  -> -k /boot/vmlinuz-6.1-x86_64 -c /etc/mkinitcpio.conf -g /boot/initramfs-6.1-x86_64-fallback.img -S autodetect
==> Starting build: '6.1.25-1-MANJARO'
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [modconf]
  -> Running build hook: [block]
==> WARNING: Possibly missing firmware for module: 'bfa'
==> WARNING: Possibly missing firmware for module: 'qla1280'
==> WARNING: Possibly missing firmware for module: 'qed'
==> WARNING: Possibly missing firmware for module: 'qla2xxx'
==> WARNING: Possibly missing firmware for module: 'aic94xx'
==> WARNING: Possibly missing firmware for module: 'wd719x'
==> WARNING: Possibly missing firmware for module: 'xhci_pci'
  -> Running build hook: [keyboard]
  -> Running build hook: [keymap]
  -> Running build hook: [consolefont]
==> WARNING: consolefont: no font found in configuration
  -> Running build hook: [plymouth]
  -> Running build hook: [resume]
  -> Running build hook: [filesystems]
==> Generating module dependencies
==> Creating gzip-compressed initcpio image: '/boot/initramfs-6.1-x86_64-fallback.img'
==> Image generation successful
==> Building image from preset: /etc/mkinitcpio.d/linux62.preset: 'default'
  -> -k /boot/vmlinuz-6.2-x86_64 -c /etc/mkinitcpio.conf -g /boot/initramfs-6.2-x86_64.img --microcode /boot/amd-ucode.img
==> Starting build: '6.2.12-1-MANJARO'
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [autodetect]
  -> Running build hook: [modconf]
  -> Running build hook: [block]
==> WARNING: Possibly missing firmware for module: 'xhci_pci'
  -> Running build hook: [keyboard]
  -> Running build hook: [keymap]
  -> Running build hook: [consolefont]
==> WARNING: consolefont: no font found in configuration
  -> Running build hook: [plymouth]
  -> Running build hook: [resume]
  -> Running build hook: [filesystems]
==> Generating module dependencies
==> Creating gzip-compressed initcpio image: '/boot/initramfs-6.2-x86_64.img'
==> Image generation successful
==> Building image from preset: /etc/mkinitcpio.d/linux62.preset: 'fallback'
  -> -k /boot/vmlinuz-6.2-x86_64 -c /etc/mkinitcpio.conf -g /boot/initramfs-6.2-x86_64-fallback.img -S autodetect --microcode /boot/amd-ucode.img
==> Starting build: '6.2.12-1-MANJARO'
  -> Running build hook: [base]
  -> Running build hook: [udev]
  -> Running build hook: [modconf]
  -> Running build hook: [block]
==> WARNING: Possibly missing firmware for module: 'bfa'
==> WARNING: Possibly missing firmware for module: 'qla1280'
==> WARNING: Possibly missing firmware for module: 'qed'
==> WARNING: Possibly missing firmware for module: 'qla2xxx'
==> WARNING: Possibly missing firmware for module: 'aic94xx'
==> WARNING: Possibly missing firmware for module: 'wd719x'
==> WARNING: Possibly missing firmware for module: 'xhci_pci'
  -> Running build hook: [keyboard]
  -> Running build hook: [keymap]
  -> Running build hook: [consolefont]
==> WARNING: consolefont: no font found in configuration
  -> Running build hook: [plymouth]
  -> Running build hook: [resume]
  -> Running build hook: [filesystems]
==> Generating module dependencies
==> Creating gzip-compressed initcpio image: '/boot/initramfs-6.2-x86_64-fallback.img'
==> Image generation successful

reboot

At this point is stuck on below screen:

If I remove “quite” from Kernel parameters, I see this:

If I ctrl+alt+f2 and try to run “startx” I see this:

Please bind only group 15 and 16. Best attach a monitor to the iGPU. Then boot and during GRUB press the e-key to edit the entry. Then an editor opens where you can change the kernel parameters temporarily. You should only bind the following IDs: 1002:73bf and 1002:ab28.

The picture you posted without the quiet parameter, all those ACPI Errors are uninteresting. It seems people have them without issues. You can pass loglevel=3 as a parameter and they should go away even. The bottom most four lines are interesting. It seems after those ACPI errors get throws the boot continues and gets stuck with this hub related message. This is the part where I, am honestly just guessing, think that your GPUs USB devices might not play nice.

Also can you please remove the amd_iommu=force_enable and rd.driver.pre=vfio-pci kernel parameters, your startx related screenshot says that they are still present, so this is not a screenshot from a fresh install.

Additionally please remove systemd.unified_cgroup_hierarchy kernel parameter, this is a possible culprit.

i am 100% certain its a manjaro or configuration bug.

according to your initial post i have the same main board and bios and even posted proof of it working :wink:
Since UEFI Updates get checksum verified before flashing i am sure its not a “bios bug”

FYI i also had a few 6000 series cards as Reference design and also posted about them here because the usb c device was really tricky:

tho i never got the usb c to work my cards 6800 / 6800xt and 6900xt worked fine with only GPU and Audio bound to vfio.

PS: if you only need to bind your GPU to vfio-pci for a lookignglass vm dont bind it the old way do it as i suggested within the running system on the fly.
You can even incorporate the rebinds in your VM startscript :slight_smile:

1 Like

I did a little bit test.
As said, vfio-pci is not loaded for some reason with kernel 6.2. In contrast, it is fine with kernel 5.19.
What I find is that kernel 6.2 does not work with vfio-pci module. I can force it to load with modprobe vfio-pci, but it grabs the primary (igpu) framebuffer and system looses the screen output afterwards. I guess that is why the vfio-pci module is not loaded by default. The good news is the dgpu will be masked with vfio-pci properly. The system is still accessible with ssh, and I can run “virsh start” to run the vm. Yes, dgpu passthough is working.

PS: I am on Ubuntu Server (no desktop). “rd.driver.pre=vfio-pci” does not work for me. Putting “vfio-pci” in /etc/modules works for me.

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation AD104 [GeForce RTX 4070 Ti] [10de:2782] (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. Device [19da:1696]
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22bc] (rev a1)
        Subsystem: ZOTAC International (MCO) Ltd. Device [19da:1696]
        Kernel driver in use: vfio-pci
        Kernel modules: snd_hda_intel

I have also noticed that kernel 6.2 doesn’t work correctly when loading the vfio-pci driver. Just updated to Ubuntu 23.04 and my pcie devices refuse to bind at startup. Downgrading to kernel 6.1 resolved the issue.

@DS_DV @H-i-v-e it just got worse

I removed all changes, back to step 3.1 from Arch wiki, and now it gets stuck just by adding the 2 IDs (1002:73bf,1002:ab28) on Kernel parameters, without even adding vfio-pci in mkinitcpio.conf.

Tired with Kernel 6.1 too, same behavior.

@DS_DV @H-i-v-e I tried both methods (adding IDs into kernel parameters, or in vfio.conf) and both methods with all 3 kernels (6.1, 6.2, 6.3) and nothing changes.

As I said before, I don’t think it is Manjaro related, as I have my initial VGA set to iGPU, and still bootloader load on my dGPU.
Same behavior is if I try to install Windows.

So the problem is BIOS settings, or something else.

I think I followed this one Kernel 6.X: Prevent loading any other driver than vfio-pci in order to passthrough graphic card. | Szymon NiedĆșwiedĆș

Just tried with BIOS F9, strangely has the same date as F10a, and didn’t change anything.

ok lets take it for a moment that despite a checksum verification and centralized downloads you get other UEFIs than the rest.

Afaik you can check if the Primary GPU switch works by checking on witch GPU the system initializes 
 bootloader is after init - therefore its no proof of a uefi fault.

So i would check which screen gets a Signal first or at least where the UEFI shows.

Windows is same as Manjaro. (might even be that some one thought it is a good idea to show stuff like boot loader on every device for troubleshooting reasons=)

before you keep blaming Hardware or firmware i repeat myself again:
Try fedora or ubuntu 
 apparently arch based is not working for you (no offense).
but you seam really frustrated. I get that. but running down the same hill again and again does not look like a solution. Maybe try a different approach.

they are different AGESAs
image

Different agesa, same release date though

I will. Packed Aorus Master and sending it back and ordered an MSI X670E Carbon. This will definitely remove the “hardware/firmware” option from the list (unless it is CPU related).

If this won’t work, I will try Fedora and then back to Windows. I am really fed up.

1 Like

Good luck, if you need assistance @ me.

1 Like

here is my config, maybe somewhat outdated, but it works without any issue.
I have a Radeon WX 2100 as host GPU and a 6800XT for guests and
I use the GPU USB-C port with an USB-C hub to connect everything what I need for my guest.

[manja01 ~]# uname -a
Linux manja01 6.2.12-1-MANJARO #1 SMP PREEMPT_DYNAMIC Thu Apr 20 14:17:37 UTC 2023 x86_64 GNU/Linux

[manja01 ~]# cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0
options snd-hda-intel enable_msi=1
options kvm_amd avic=1

[manja01 ~]# cat /etc/modprobe.d/vfio.conf
install vfio-pci /usr/local/bin/vfio-pci-override.sh
softdep amdgpu pre: vfio vfio_pci vfio-pci
softdep xhci_pci pre: vfio vfio_pci vfio-pci
softdep pcieport pre: vfio vfio_pci vfio-pci

[manja01 ~]# cat /usr/local/bin/vfio-pci-override.sh
#!/bin/sh
PREREQS=“”
DEVS=“0000:0d:00.0 0000:0d:00.1 0000:0d:00.2 0000:0d:00.3”
for DEV in $DEVS; do
echo “vfio-pci” > /sys/bus/pci/devices/$DEV/driver_override
done

modprobe -i vfio-pci

[manja01 ~]# cat /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT=“iommu=pt amd_iommu=on amd_cpufreq=enable pcie_aspm=off mitigations=off default_hugepagesz=1G hugepagesz=1G udev.log_priority=3”

[manja01 ~]# cat /etc/mkinitcpio.conf

MODULES=“vfio_pci vfio vfio_iommu_type1 vfio_virqfd”

FILES=“/usr/local/bin/vfio-pci-override.sh”

HOOKS=“base udev autodetect modconf block keyboard keymap consolefont filesystems fsck”

Libvirt GPU

<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x17' slot='0x00' function='0x0' multifunction='on'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x0d' slot='0x00' function='0x1'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x17' slot='0x00' function='0x1'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x0d' slot='0x00' function='0x2'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x17' slot='0x00' function='0x2'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x0d' slot='0x00' function='0x3'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x17' slot='0x00' function='0x3'/>
</hostdev>
1 Like

Today I received the 3rd motherboard, a MSI MPG X670e Carbon.

Long story short, it has the exact same behavior.
I even tried EndeavourOS, but doesn’t even install with iGPU enabled.
I tried with a Radeon 5500 on the 2nd PCIe, there is no setting to assign it as initial, same behavior.

The only very strange thing that happened is, that originally I just tried to boot with my old installation, and just by assigning the 6900XT IDs to Kernel parameter, and checking with “lspci -nvv” it was using vfio-pci driver, instead of amdgpu. I was excited, and because I made several tests with that installation, I thought it would be a good idea to do a fresh install.
–Never worked since.

1 Like

So, fresh start, with GB again, as MSI had several issues with booting, and after many retries, I couldn’t even install Manjaro/Kubuntu/EndeavourOS


System setup
24" monitor connected to iGPU on DP
27" monitor connected to iGPU on HDMI
and to 6900XT on DP

  1. BIOS update to F10c, not iGPU related but AGESA and burnt CPUs, CMOS cleared

  2. Modified BIOS settings as below



  3. Fresh install of Manjaro, connected only 27" on iGPU HDMI port

  4. lspci -nvv, 6900 loads amdgpu driver

  5. Update kernel parameters as follows:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash resume=UUID=6a287481-0fcd-42dd-8b92-1b1b11c8a9b4 udev.log_priority=3 amd_iommu=force_enable iommu=pt vfio-pci.ids=1002:73bf,1002:ab28 kvm.ignore_msrs=1 hugepages=16384 systemd.unified_cgroup_hierarchy=1"
  1. Update grub - reboot
  2. lspci -nvv, 6900 loads amdgpu driver !!! Completed ignored the vfio-pci.ids parameter !!!
  3. Update kernel parameters as follows: (this was working on my AM4 system with a Radeon 5500 as primary/initial on PCIe 3 slot)
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash resume=UUID=6a287481-0fcd-42dd-8b92-1b1b11c8a9b4 udev.log_priority=3 amd_iommu=force_enable iommu=pt vfio-pci.ids=1002:73bf,1002:ab28 kvm.ignore_msrs=1 hugepages=16384 systemd.unified_cgroup_hierarchy=1 video=efifb:off gfxpayload=console gfxpayload=auto"
  1. Update grub - reboot
  2. lspci -nvv, 6900 loads amdgpu driver !!! Completed ignored the vfio-pci.ids parameter !!!

Running Kernel 6.1
-Fresh install, no apps, no updates

Please advise

@DS_DV can you please compare my BIOS settings with yours?
@H-i-v-e what do you think
?