[SOLVED] Unable to isolate GPU for VFIO (workaround)

here is my config, maybe somewhat outdated, but it works without any issue.
I have a Radeon WX 2100 as host GPU and a 6800XT for guests and
I use the GPU USB-C port with an USB-C hub to connect everything what I need for my guest.

[manja01 ~]# uname -a
Linux manja01 6.2.12-1-MANJARO #1 SMP PREEMPT_DYNAMIC Thu Apr 20 14:17:37 UTC 2023 x86_64 GNU/Linux

[manja01 ~]# cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0
options snd-hda-intel enable_msi=1
options kvm_amd avic=1

[manja01 ~]# cat /etc/modprobe.d/vfio.conf
install vfio-pci /usr/local/bin/vfio-pci-override.sh
softdep amdgpu pre: vfio vfio_pci vfio-pci
softdep xhci_pci pre: vfio vfio_pci vfio-pci
softdep pcieport pre: vfio vfio_pci vfio-pci

[manja01 ~]# cat /usr/local/bin/vfio-pci-override.sh
#!/bin/sh
PREREQS=“”
DEVS=“0000:0d:00.0 0000:0d:00.1 0000:0d:00.2 0000:0d:00.3”
for DEV in $DEVS; do
echo “vfio-pci” > /sys/bus/pci/devices/$DEV/driver_override
done

modprobe -i vfio-pci

[manja01 ~]# cat /etc/default/grub

GRUB_CMDLINE_LINUX_DEFAULT=“iommu=pt amd_iommu=on amd_cpufreq=enable pcie_aspm=off mitigations=off default_hugepagesz=1G hugepagesz=1G udev.log_priority=3”

[manja01 ~]# cat /etc/mkinitcpio.conf

MODULES=“vfio_pci vfio vfio_iommu_type1 vfio_virqfd”

FILES=“/usr/local/bin/vfio-pci-override.sh”

HOOKS=“base udev autodetect modconf block keyboard keymap consolefont filesystems fsck”

Libvirt GPU

<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x0d' slot='0x00' function='0x0'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x17' slot='0x00' function='0x0' multifunction='on'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x0d' slot='0x00' function='0x1'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x17' slot='0x00' function='0x1'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x0d' slot='0x00' function='0x2'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x17' slot='0x00' function='0x2'/>
</hostdev>
<hostdev mode='subsystem' type='pci' managed='yes'>
  <source>
    <address domain='0x0000' bus='0x0d' slot='0x00' function='0x3'/>
  </source>
  <address type='pci' domain='0x0000' bus='0x17' slot='0x00' function='0x3'/>
</hostdev>
1 Like

Today I received the 3rd motherboard, a MSI MPG X670e Carbon.

Long story short, it has the exact same behavior.
I even tried EndeavourOS, but doesn’t even install with iGPU enabled.
I tried with a Radeon 5500 on the 2nd PCIe, there is no setting to assign it as initial, same behavior.

The only very strange thing that happened is, that originally I just tried to boot with my old installation, and just by assigning the 6900XT IDs to Kernel parameter, and checking with “lspci -nvv” it was using vfio-pci driver, instead of amdgpu. I was excited, and because I made several tests with that installation, I thought it would be a good idea to do a fresh install.
–Never worked since.

1 Like

So, fresh start, with GB again, as MSI had several issues with booting, and after many retries, I couldn’t even install Manjaro/Kubuntu/EndeavourOS…

System setup
24" monitor connected to iGPU on DP
27" monitor connected to iGPU on HDMI
and to 6900XT on DP

  1. BIOS update to F10c, not iGPU related but AGESA and burnt CPUs, CMOS cleared

  2. Modified BIOS settings as below



  3. Fresh install of Manjaro, connected only 27" on iGPU HDMI port

  4. lspci -nvv, 6900 loads amdgpu driver

  5. Update kernel parameters as follows:

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash resume=UUID=6a287481-0fcd-42dd-8b92-1b1b11c8a9b4 udev.log_priority=3 amd_iommu=force_enable iommu=pt vfio-pci.ids=1002:73bf,1002:ab28 kvm.ignore_msrs=1 hugepages=16384 systemd.unified_cgroup_hierarchy=1"
  1. Update grub - reboot
  2. lspci -nvv, 6900 loads amdgpu driver !!! Completed ignored the vfio-pci.ids parameter !!!
  3. Update kernel parameters as follows: (this was working on my AM4 system with a Radeon 5500 as primary/initial on PCIe 3 slot)
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash resume=UUID=6a287481-0fcd-42dd-8b92-1b1b11c8a9b4 udev.log_priority=3 amd_iommu=force_enable iommu=pt vfio-pci.ids=1002:73bf,1002:ab28 kvm.ignore_msrs=1 hugepages=16384 systemd.unified_cgroup_hierarchy=1 video=efifb:off gfxpayload=console gfxpayload=auto"
  1. Update grub - reboot
  2. lspci -nvv, 6900 loads amdgpu driver !!! Completed ignored the vfio-pci.ids parameter !!!

Running Kernel 6.1
-Fresh install, no apps, no updates

Please advise

@DS_DV can you please compare my BIOS settings with yours?
@H-i-v-e what do you think…?

you only need IOMMU, SVM and IGPU primary.
Disable SR-IOV, your 6900 doesn’t support that.
Forget the vendor IDs and use the PCIe addresses

Look at
https://wiki.archlinux.org/title/PCI_passthrough_via_OVMF

6.1.1.2 Passthrough selected GPU

And at least on my system, this was important from kernel 5.15 onwards

softdep amdgpu pre: vfio vfio_pci vfio-pci

So, in BIOS settings, I only set my iGPU as initial display?
No Forced?
All other UMA settings to AUTO ?

No wonder, it sees the parameter but does not know it, you need to make sure the module is loaded, you do that during boot by adding the following to /etc/mkinitcpio.conf and regenerate the initramfs with mkinitcpio -P, else nothing will happen.

MODULES=(... vfio_pci vfio vfio_iommu_type1 ...)
[...]
HOOKS=(... modconf ...)

I told you to follow the Arch wiki, but you need to follow all the steps necessary.

The vfio-pci in vfio-pci.ids is the module it wants, so if you have no included the module in the initramfs it won’t find it and won’t know what vfio-pci.ids is supposed to mean. The initramfs is the first binary that gets loaded by your bootloader and when you want to apply a parameter for a module that is not part of the kernel you need to make sure the module is included here. Other parameters without a dot like iommu=pt are parameters that come directly from the kernel, which is always included.

1 Like

Three weeks ago I configured a system for a friend, AM4 with 5700G and 5700XT, I only set SVM, IOMMU and the IGPU as primary in the bios, it worked immediately, apart from the fact that his 5700Xt has the reset bug and needs a special treatment.

UMA means Unified Memory Access, why do I have to set this on a UMA system?
Leave this on default and give your IGPU 256MB memory and your are done with it.

Do not connect the 6900XT until you have finished the configuration.

I am a Software Engineer, when I debug I do one change at a time and test :nerd_face:

So, now looks ok.

What is the next step?
Updates?
Connect the monitor?

1 Like

Now I would connect a monitor the the AMD card and try to pass it through. The vfio-pci stub driver is loaded now. Pass through only the card itself and its sound card.

…to login… Nope.
After I booted up with the 27" connected on 6900@DP port, black screen, no login screen on either inputs.

Do not just add the Card via VM Manger, you will get a different virtual Bus ID for each component of your GPU, which is not correct.
Use virsh and change the configuration as shown above

softdep amdgpu pre: vfio vfio_pci vfio-pci

Problem here is probably that the mainboard initilizes the 6900 first and then you bind the vfio-pci to it stealing the output devices. You best test to see if you made something wrong now is to place some spare GPU in the upper most GPU slot and the 6900 somewhere below it, in one of the lower PCIe slots. Consumer mainboard usually initilize the GPUs from top most PCIe slot downwards. My guess is that the option to use the uGPU as primiary output devices does not work correctly. I have made this experience with Gigabyte board myself.
In case this setup works, you know you did nothing wrong but the mainboard does not corretly initilizes the iGPU first.

[quote=“H-i-v-e, post:95, topic:196250, full:true”]
Problem here is probably that the mainboard initilizes the 6900 first and then you bind the vfio-pci to it stealing the output devices.[/quote]
3 different boards, with the same issue? Sound like a CPU implementation bug, or UEFI, like as soos as it sees the monitor connected, overrides the setting

[quote=“H-i-v-e, post:95, topic:196250, full:true”]
You best test to see if you made something wrong now is to place some spare GPU in the upper most GPU slot and the 6900 somewhere below it, in one of the lower PCIe slots. Consumer mainboard usually initilize the GPUs from top most PCIe slot downwards. My guess is that the option to use the uGPU as primiary output devices does not work correctly. I have made this experience with Gigabyte board myself.[/quote]
Unfortunately I cannot do that. I tried with 5500 (iGPU disabled) but in PCIe 3 slot, and does the same. I cannot move 6900 for now, I have custom w/c and it is connected.

This is what I am saying from the beginning, but all 3 motherboards have the same issue?

Can you elaborate?

I already have, just read my posts

Yeah this obviously won’t work. It initilizes the upper most slot with the 6900 first and then you once again steal the input device by binding vfio-pci. As I said, I suppose the GPU selection in the UEFI does nothing and the mainboard follows the usual initilization order. I suspect this because I have had Gigabyte boards myself and always needed to follow some weird combination of enabling and disabling settings to even make it work. Most of the time it did not work, tough. This is why I finally resorted to place the passthrough GPU in a lower slot.

But as I said placing the 5500 in slot 3, and choosing slot 3 in the UEFI is probably not working because this is the same as choosing the iGPU in the UEFI and exactly what we want to with this test in the first place.

But why should he do that, vfio-pci obviously already loads before the amdgpu driver because otherwise he would not be ablte to successfully bind it vfio-pci when no monitor is connected. This option won’t do anything, it does not change the initilization order.

Funny though, I had this issue with the MSI X570 Tomahawk, and I got the Aorus Master because it worked out of the box…

Quick question, what desktop environment do you use? And do you use X11 or Wayland?