2 GPUs have the same ID

Hi,

I am trying to identify one of my GPUs to prevent the linux driver from hooking onto it (I want to dedicate it to my Windows KVM). I am following this tutorial: Ubuntu 17.04 -- VFIO PCIe Passthrough & Kernel Update (4.14-rc1)

When using their ls-iommu script It only shows one GPU. I tried unplugging one of my GPUs and now I am using the other GPU but it still has the same id as the other one.

I am using two GTX 1050 Ti.

Greetings

Flowstar

Edit Interestingly they show up. If I have both GPUs plugged in then I get the same “VGA Controller …” two times. If I unplug one of them I only get that line once. But always with the same ID.

Having GPUs of the same model is not ideal for passthrough. There is a way to bind vfio to only one of the GPUs, see the page linked below- https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Using_identical_guest_and_host_GPUs

The instructions should work with some modifications due to Ubuntu and Arch using different initramfs systems.

3 Likes

On Nvidia, you can just unbind nouveau or nvidia from the GPU you want to pass through, using the sysfs module unbind feature.


Can you post the output of ls-iommu and the rest of your system specs? That shouldn’t be happening.


So there is device ID (think of it as device model number) and device address which is how you find the device on the PCIe tree.

The address is the 06:00.1 segment. Those should be different, so you can identify the devices that way. How to determine which GPU is which, that’s a different story. I think you need to use trial and error because they’re numbered sequentially, for example:

you’ve got two GPUs, one at 12:00 and one at 11:00. You unplug the 11:00 one and the one at 12:00 then gets addressed at 11:00.

2 Likes

Its a new Computer:

Board: ASRock Z730 Extreme 4
CPU: Intel Core i7 8700K
RAM: 32GB DDR4 PC-3000
GPU: 2x GTX 1050 Ti (PCI 2 & PCI 4)

And the VGA IOMMU stuff:

IOMMU Group 1 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
IOMMU Group 1 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
IOMMU Group 2 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:3e92]

This leads me to another question. Both GPUs are in the same IOMMU Group. Is that another problem?

Yes, but there’s a hacky workaround.

What slots are they in? You could try shuffling the PCI slots around to get them separated.

1 Like

They are in 2 & 4. The other ones that it would fit in are x1 slots. (It works but I don’t know how well)

nevermind then. We’ll just use the ACS patch.

You can get debs for it here:

https://queuecumber.gitlab.io/linux-acs-override/

Just choose whatever kernel version you’d like to use, I recommend the same version as your system is currently on.

2 Likes

Got that done :slight_smile:

1 Like

Sweet, okay, now. Kernel options for splitting your IOMMU groups look like this:

pci_acs_override [PCIE] Override missing PCIe ACS support for:
	downstream
		All downstream ports - full ACS capabilities
	multifunction
		Add multifunction devices - multifunction ACS subset
	id:nnnn:nnnn
		Specific device - full ACS capabilities
		Specified as vid:did (vendor/device ID) in hex

So, for you, you’ll want to try the following first.

pcie_acs_override=id:10de:1c82

If that doesn’t split them out, try the following (A bit more scorched earth though):

pcie_acs_override=downstream

I am new to this scene :sweat_smile: where do I set those options? (OS: Ubuntu 18.04)

Those are kernel command line options. On most distros, that can be found in /etc/default/grub, but I forget if Ubuntu follows that standard. if you edit that file, it should look something like this:

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.driver.blacklist=nouveau resume=UUID=5c31b833-d7b7-4a70-bea8-951ce90778c9 rhgb quiet"
GRUB_DISABLE_RECOVERY="true"

What you’re interested in is the GRUB_CMDLINE_LINUX field. Just append the snippit from my previous post to it, save it and run sudo update-grub.

You’ll need root for all that.

I need to start focusing on my work, so if I’m a bit slow to respond, I apologize.

1 Like

The latter works for me :smiley:

IOMMU Group 12 00:1f.3 Audio device [0403]: Intel Corporation 200 Series PCH HD Audio [8086:a2f0]
IOMMU Group 14 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
IOMMU Group 14 01:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
IOMMU Group 15 02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] [10de:1c82] (rev a1)
IOMMU Group 15 02:00.1 Audio device [0403]: NVIDIA Corporation GP107GL High Definition Audio Controller [10de:0fb9] (rev a1)
IOMMU Group 3 00:02.0 VGA compatible controller [0300]: Intel Corporation Device [8086:3e92]

Perfect, now you should be able to use the archwiki guide linked above to get the GPU you’d like to pass through bound with vfio-pci.

I tried. I am now encountering a problem

tree /sys/bus/pci/drivers/nouveau/
/sys/bus/pci/drivers/nouveau/
├── 0000:01:00.0 -> ../../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0
├── 0000:02:00.0 -> ../../../../devices/pci0000:00/0000:00:01.1/0000:02:00.0
├── bind
├── module -> ../../../../module/nouveau
├── new_id
├── remove_id
├── uevent
└── unbind

3 directories, 5 files

I wanted to unbind the first GPU (0000:01:00.0) since my monitors are hooked up to that GPU, everything blacked out immediately. I tried rebooting the pc blind. After 5 minutes i did a hard-reset and tried it again with the second GPU (0000:02:00.0). Then I noticed, that the echo command doesn’t finish. Even if I kill the terminal, the command is still going and preventing the system from rebooting.

You did this from a root prompt, right?

It might have to be done while X is not running.


Have you tried this script:

#!/bin/sh

for i in /sys/devices/pci*/*/boot_vga; do
	if [ $(cat "$i") -eq 0 ]; then
		GPU="${i%/boot_vga}"
		AUDIO="$(echo "$GPU" | sed -e "s/0$/1/")"
		echo "vfio-pci" > "$GPU/driver_override"
		if [ -d "$AUDIO" ]; then
			echo "vfio-pci" > "$AUDIO/driver_override"
		fi
	fi
done

modprobe -i vfio-pci

You can require this script in modprobe.d:

install vfio-pci /usr/bin/vfio-pci-override.sh

And it will run that script, to initialize your non-boot GPU to have the vfio-pci driver.

1 Like

I am currently at work. Will try this when I am at home later. Thanks in advance :slight_smile:

1 Like

Do I have to replace something in the script?

Nope. that should work as is.

I don’t think, that the script is getting executed at bootup. If I add some debug messages, they don’t show up in boot log. And both GPUs are still claimed by nouveau.

$ cat /etc/modprobe.d/nouveau.conf 
softdep nouveau pre: vfio vfio_pci
install vfio-pci /usr/bin/vfio-pci-override.sh

I did run update-initramfs -k all -u

Its running now but cat can’t find the path /sys/devices/pci*/*/boot_vga