[SOLVED] Unable to isolate GPU for VFIO (workaround)

My AM4 was working with 6.2 with no issues, however I gave it a try.

cat /etc/mkinitcpio.conf

MODULES="crc32c vfio_pci vfio vfio_iommu_type1"

BINARIES=()

FILES="/usr/local/bin/vfio-pci-override.sh"

HOOKS="base udev autodetect modconf block keyboard keymap consolefont plymouth resume filesystems"
...

cat /etc/modprobe.d/vfio.conf

install vfio-pci /usr/local/bin/vfio-pci-override.sh
softdep amdgpu pre: vfio vfio_pci vfio-pci
softdep xhci_pci pre: vfio vfio_pci vfio-pci
softdep pcieport pre: vfio vfio_pci vfio-pci

cat /usr/local/bin/vfio-pci-override.sh

#!/bin/sh

DEVS="0000:03:00.0 0000:03:00.1"

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
    for DEV in $DEVS; do
        echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
    done
fi

modprobe -i vfio-pci

ls -l /usr/local/bin/vfio-pci-override.sh

-rwxr-xr-x 1 root root 220 Apr 28 13:10 /usr/local/bin/vfio-pci-override.sh

Same issue.

For now, I will implement the workaround of not connecting my 27" on 6900@DP and will connect it only when I need to run the VM.

Workaround, but it works.

I am still waiting for GB to respond to my ticket though, which was logged as a bug and nobody ever replied.

why only 2 devices, my GPU has four

1 Like

What about the error 43 into Windows?
My “old” settings don’t work…

As I mention in previous posts, Linux kernel 6.x is broken.
Try to run modprobe -i vfio-pci on your console directly. For me, it freezes my monitor every single time.
If I boot it with linux 5.19, no issue at all.

1 Like

sorry i only have bad experiences with MSI
I ts why i only use ASUS and Gigabyte.
MSI UEFIs are horrible even would go rather with ASROCK XD
hence Gigabyte and MSI is not comparable.

I advice you to use the Aorus Master with Ubtuntu or Fedora (Nobara).

The Nobara W38 release apparently is around the corner :slight_smile:
It even has several Gamingperformance Kernel Patches build in

ill take a look if i can dump my uefi settings somewhere and get you a copy :slight_smile:

EDIT1:
maybe this successsory helps in the meantime:

EDIT2: I found the save profile function but in contrast to asus the file is not in plain text i have no clue how to open / read it :confused:

1 Like

I am trying to copy his XML, at least most parts, but it doesn’t work for me.
Either I am getting the infamous error 43, it is stuck during boot at “display not initialized (yet)” message

[SOLVED]
I had to disable "Resizable BAR support " in BIOS to work.

Thank you all for your time and efforts, especially @H-i-v-e , @DS_DV and @Janos !

2 Likes

Oh fudge, I could have told you that. I did not cross my mind unfortunately. Glad it works now!

2 Likes

Doesn’t disabling rebar come with performance degradation?

Once I do my benchmarks, I will be able to answer that.
However, Arch Wiki (10.4 & 10.24) has some settings to define BAR size, but they didn’t work for me.

glad to see that you got it working now :smiley:
new platforms are always tricky. i wonder why you had to disable rebar.
That one i only had to disable on AM4 o.o

1 Like

Maybe it is a bug of RX 6900 XT
I can passthrough RTX 4070 Ti with resizable bar enabled on my MSI x670e Carbon just fine.

I read up on it and it seems from 6.1 onward you can use ReBar, but you don’t set it in the UEFI. There is a Reddit thread explaining the new feature. The article mentions support from UEFI is preferred, but given your setup outright refuses to work with it, I would let it disabled. For example I ran the checks from the article on my computer and my host already uses the biggest available bar sizes. So no need to touch anything for me. Everyone would need to check their own setup though.

P.S. I just checked and my graphics cards driver reports ReBar working, without me having done anything. ReBar is not activated in the UEFI and I have not touched the sysfs configuration. I figure you need to leave it disabled in the UEFI if the setting hinders the boot process, but you might check if it is already working and if not you can try to set the option manually as explained in the article via sysfs.

2 Likes

Just a heads up:
Gigabyte BIOS F10d, not only it didn’t fix the issue, but destroyed my bootloader, and made my boot partition act like read-only, so I have to re-install everything.

Finally, after many delays I got a reply from GB to the ticked I opened.

The reason why the screen display can be output through the graphics card is that during the boot menu loading, the display device is determined by the software program, not controlled by the BIOS.

It can be seen from the video that after the dual monitors are connected, the BIOS screen can be displayed on the iGPU normally, which means that the BIOS settings are normal.

The F12 boot menu will also be displayed on the iGPU normally, but if the external program is loaded, it has nothing to do with the BIOS.

This happens to Windows too, but they seem to blame Manjaro or OS in general, as I sent them a video of the problem.

So, if it is Manjaro to blame, they wont make any BIOS changes, therefore, how can I permanent fix it?

i had sooo many problems with manjaro that i now switched to nobara (a fedora fork from Glorious Eggroll). I did not yet get my VFIO up and running again due to time restrains.

but i can say that i upgraded my bios from F8>F9>F10a>F10b>F10c>F10d>F10 and now to F11a without any of the problems you describe.

Only thing i noticed is that after upgrade the UEFI defaults to windows regardless of the prioritys of the SSDs so i always have to make sure it does not boot the SSD with the WindowsVM.

Other than that setting Primary Display Out works fine for GRUB UEFI etc.

Fedora does seam to init all gpus and if dGPU is connected to a display it will be preferred. But there is a gihtub for a udev rule which you can use once its done to set primary display out in wayland.

PS: Nobara came with its own set of setup challenges but other than that it is much more stable and bug free than manjaro (which was the worst os i have ever used including windows vista )

Thank you for taking time to reply.
Although GB says it is a software issue, I am not buying that, they just think it is not a priority, therefore they won’t spend time to fix this.

Now, as I wrote before, I did test it with Windows too. As soon as I boot from the USB stick, Windows setup uses my dGPU to display setup process. Also I tried Kubuntu (Debian, not Arch based) and did the same with even more challenges, like failing to begin the installation, etc.).
Fedora too. I didn’t install it, but started the installation process, and goes straight to dGPU.

What baffles me is that I am excluding the dGPU IDs in the Kernel boot parameters, and still loads them… If that worked, as it did before with my 3900X and 5900X, it would be ok.

3900X and 5900x dont have IDGs … so they have to work differently.

try booting with displays only connected to the IDG via Motherboard and no display connected to the dGPU.

that way it should only initialize the IDG.

If you are lucky you can connect Displays to the dGPU after Boot and they work well :smiley:

With 3900X & 5900X I was using a XT5500 in PCIe-3, which was set into BIOS as initial GPU, like I do now with iGPU, and again, it worked perfect.

Also, I did try to boot without the cables on, and as soon as I connect them, VM doesn’t work.
I also tried to setup Manjaro, without 6900XT connected, set the IDs in kernel parameters and then connect it to PCIe. Didn’t work either.

The only way it works for me, is to boot up with the cable disconnected from the dGPU (tried to disconnect it from the monitor side too, didn’t work), start the VM, and once it on desktop, connect the cable.

Even if I try to reboot the VM, it fails the next time. If I want to reboot the VM, I have to never connect the cable, or if I do, restart my host without the cable too.