VGA arbitration on X570? WTF?

While trying to prepare my system for a third GPU I ran in into a bit of a conundrum.

The current setup consists of a an Aorus Master with an RTX 3000 card in PCIe slot 1 (the 16x one, next to the CPU) and an older AMD something from a used Dell for the host.

Now I have an AMD W7100 incoming and thought, how neat, now I can use the x4-slot (No 3) for a NVME adapter (SSD is x4 only, anyway) and use the x8 slot for the second (host) GPU.

That way I may even be able to follow Wendells guide on how to mod the W7100 for SR-IOV and have a little more bandwidth on the x8-slot.

But, it didn’t work: after booting with the AMD card (it’s actually an ATI, I think) in slot 2 and the 3090 in slot 1 I couldn’t log into Ubuntu, it would always yank be back to login screen, like in a loop.

After digging around, I found these:

Jan 05 13:11:17 zenmasterl /usr/lib/gdm3/gdm-x-session[2776]: (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
Jan 05 13:11:26 zenmasterl /usr/lib/gdm3/gdm-x-session[2809]: (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
Jan 05 13:11:53 zenmasterl /usr/lib/gdm3/gdm-x-session[3034]: (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
Jan 05 13:18:05 zenmasterl /usr/lib/gdm3/gdm-x-session[2701]: (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
Jan 05 13:26:16 zenmasterl /usr/lib/gdm3/gdm-x-session[2758]: (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
Jan 05 13:26:44 zenmasterl /usr/lib/gdm3/gdm-x-session[2968]: (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
Jan 05 13:29:13 zenmasterl /usr/lib/gdm3/gdm-x-session[3768]: (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
Jan 05 13:36:24 zenmasterl /usr/lib/gdm3/gdm-x-session[3035]: (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support
Jan 05 13:40:43 zenmasterl /usr/lib/gdm3/gdm-x-session[2721]: (WW) VGA arbiter: cannot open kernel arbiter, no multi-card support

Now why are these there? Shouldn’t the x8-slot hang off the PCH and the two cards should not interfere? What am I missing?

Sure enough, putting the AMD card back in slot 3 works.

w/ best regards
Gerd

Furthermore, the 3090 is marked for vfio in the kernel and Linux never uses it:

Kernel command line: BOOT_IMAGE=/vmlinuz-5.4.0-59-generic root=/dev/mapper/vgubuntu-root ro quiet splash amd_iommu=on iommu=pt kvm.ignore_msrs=1 vfio-pci.ids=10de:2204,10de:1aef,1022:1487 vt.handoff=7

So why is Xorg bothering with it?

I also saw a change in IOMMU groups when shuffling around GPUs, is this expected?

Working config, GPUs in slots 1 and 3, primary on 3 (selected via BIOS):

IOMMU Group 27 06:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Turks PRO [Radeon HD 7570] [1002:675d]
IOMMU Group 27 06:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Turks HDMI Audio [Radeon HD 6500/6600 / 6700M Series] [1002:aa90]

IOMMU Group 31 0d:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2204] (rev a1)
IOMMU Group 31 0d:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aef] (rev a1)

non-working config:

IOMMU Group 30 0c:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2204] (rev a1)
IOMMU Group 30 0c:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:1aef](rev a1)
IOMMU Group 31 0d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Turks PRO [Radeon HD 7570] [1002:675d]
IOMMU Group 31 0d:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Turks HDMI Audio [Radeon HD 6500/6600 / 6700M Series] [1002:aa90]

I get that this is differnt, but it’s still different IOMMU groups and the device IDs in the kernel command line are still the same for the 3090 …

I am in the dark :confused:

can we see your gdm and xorg config files? i suspect the problem to lie there.

Sure, although, xorg.conf doesn’t extist anymore - at least on Ubuntu?

Here is the gdm config:

# GDM configuration storage
#
# See /usr/share/gdm/gdm.schemas for a list of available options.

[daemon]
# Uncomment the line below to force the login screen to use Xorg
#WaylandEnable=false

# Enabling automatic login
#  AutomaticLoginEnable = true
#  AutomaticLogin = user1

# Enabling timed login
#  TimedLoginEnable = true
#  TimedLogin = user1
#  TimedLoginDelay = 10

[security]

[xdmcp]

[chooser]

[debug]
# Uncomment the line below to turn on debugging
# More verbose logs
# Additionally lets the X server dump core if it crashes
#Enable=true

Pretty clean, as I didn’t mess with it or any other.

But, after a little research, I think the first two PCIe slots actually connect to the CPU. Only the third hangs off the PCH. Then the need for VGA arbitration would make sense.