Conversion of lspci ID to /etc/modprobe.d format?

I’m having an issue getting an Nvidia card to passthrough on an x570 Aurus Master board running Debian 11 (testing) on the host.

The PCIe card has two devices, a graphics controller and an audio controller, both in the same IOMMU group. (They are the only things in that group).

I’m not sure how to get the audio controller to bind to VFIO. Here’s where I’m at:

0b:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 760] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: eVga.com. Corp. GK104 [GeForce GTX 760]
	Flags: fast devsel, IRQ 5, IOMMU group 28
	Memory at fa000000 (32-bit, non-prefetchable) [disabled] [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [disabled] [size=128M]
	Memory at e8000000 (64-bit, prefetchable) [disabled] [size=32M]
	I/O ports at f000 [disabled] [size=128]
	Expansion ROM at fb000000 [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
--
0b:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
	Subsystem: eVga.com. Corp. GK104 HDMI Audio Controller
	Flags: bus master, fast devsel, latency 0, IRQ 181, IOMMU group 28
	Memory at fb080000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

So the Nvidia card has no driver bound (not sure if it should be vfio?), but the audio controller still has snd_hda_intel.

/etc/modprobe.d/vfio.conf

options vfio-pci ids=0b:00.0,0b:00.1
softdep nouveau pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidia-drm pre: vfio-pci
softdep nvidia pre: vfio-pci
softdep nvidia-modeset pre: vfio-pci
softdep nvidia_current_drm pre: vfio-pci
softdep nvidia_current pre: vfio-pci
softdep nvidia-current-uvm pre: vfio-pci
softdep nvidia-current-modeset pre: vfio-pci
softdep nvidia-uvm pre: vfio-pci
#snd_hda_intel is the driver for the audio controller: 
softdep snd_hda_intel pre: vfio-pci

I did run update-initramfs -u and reboot.

On another machine with working graphics passthrough, both the audio and graphics devices show:

Kernel driver in use: vfio-pci

But that’s a Radeon card, on completely different hardware. But, on that machine, the IDs in /etc/modprobe.d/vfio.conf don’t match what is coming out of lspci. vfio.conf says ids=1002:6758,1002:aa90 while the ID’s from lspci are 81:00.0 and 81:00.1. So either some kind of conversion is required, or /etc/modprobe.d/vfio.conf isn’t doing anything.

Is there some kind of conversion required? Hex to decimal or something? And how?

the modprobe method is outdated. use the newer and IMHO more reliable / easier sysfs method.

Edit: lspci -Dk will give you the PCI address information needed

That gives:

0000:0b:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 760] (rev a1)
	Subsystem: eVga.com. Corp. GK104 [GeForce GTX 760]
	Kernel modules: nvidia

So 0000:0b:00.0 is what should go in /etc/modprobe.d/vfio.conf? ie:

options vfio-pci ids=0000:0b:00.0,0000:0b:00.

That doesn’t seem to work. On my machine that does work, the ID format looks like:

ids=1002:6758,1002:aa901

The id’s with the colon and no periods, are like the internal model number of the graphics, and audio part of the card. These would remain the same, even if you change the slot it is in, or even in another computer.

You might find other models in nVidia’s 700 series have the same four digits in front of the colon.

The 0000:0b.01.0 or whatever is the address on the pci bus which the card is currently connected to, and usually has the audio as the same address with .1 instead of .0 at the end. There are different ways the computer addresses each link, depending on what, and how the device is being accessed, hence the other style of address, the 81:00.01

If you ran dmesg with grep for pci, you might see several listings for the same items in several sections, sometimes with the 0000:0b.01 and sometimes other ways of referring to it.

For the vfio.conf, you could use the models’ id, or the pci address, even mixing them up, but might want to double check the card is in the right address especially if you change the socket it’s connected to.

If you used the pci name, it would clash if you had another card of the same model in the system, but would allow for the card to be moved to different pci sockets

2 Likes

And:

lspci -Dknn | grep NVIDIA

outputs both the address, and the model for me:

0000:09:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM200 [GeForce GTX 980 Ti] [10de:17c8] (rev a1)
0000:09:00.1 Audio device [0403]: NVIDIA Corporation GM200 High Definition Audio [10de:0fb0] (rev a1)
0000:44:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204 [GeForce GTX 980] [10de:13c0] (rev a1)
0000:44:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)

I personally isolate the model with the vfio conf file, because I have two cards, a ti I pass through, and a normal, which I use for the host, but I do move them around sometimes:

options vfio_pci ids=10de:17c8,10de:0fb0

and then use the address to bind the card to a virtual machine with vmm/qemu (graphics then audio bits in order)

 <hostdev mode="subsystem" type="pci" managed="yes">
   <source>
     <address domain="0x0000" bus="0x09" slot="0x00" function="0x0"/>
   </source>
   <address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
 </hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
  <source>
   <address domain="0x0000" bus="0x09" slot="0x00" function="0x1"/>
  </source>
  <address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</hostdev>

I just have to make sure the address is updated when I move cards. it;s long winded, because that is the way vmm formatted it…

I can’t seem to get either method to work.

options vfio-pci ids=10de:1187,10de:0e0a

It still says:

Kernel driver in use: snd_hda_intel

…for the audio, and does not list a kernel driver for the video card.

I’m saying you don’t need to use vfio.conf at all.

1 Like

Huh, I did not see your post about this till today, but I’m convinced!

I’m not updating my notes for next build, Thanks man.

@lightnb how did you get on with binding / unbinding the driver in use on the cards?

I mean if you elevate to root (sudo -i)
then run

echo 0000:0b:00.0 > /sys/bus/pci/devices/0000\:0b\:00.0/driver/unbind
echo 0000:0b:00.0 > /sys/bus/pci/drivers/vfio-pci/bind

and

echo 0000:0b:00.1 > /sys/bus/pci/devices/0000\:0b\:00.1/driver/unbind
echo 0000:0b:00.1 > /sys/bus/pci/drivers/vfio-pci/bind

does it bind them to vfio?

echo 0000:0b:00.0 > /sys/bus/pci/devices/0000:0b:00.0/driver/unbind

just hangs the terminal…

ah, dang, it took out backslashes, one sec

Sorry man

echo 0000:0b:00.0 > /sys/bus/pci/devices/0000\:0b\:00.0/driver/unbind
echo 0000:0b:00.0 > /sys/bus/pci/drivers/vfio-pci/bind

and

echo 0000:0b:00.1 > /sys/bus/pci/devices/0000\:0b\:00.1/driver/unbind
echo 0000:0b:00.1 > /sys/bus/pci/drivers/vfio-pci/bind

Under /sys/bus/pci/devices/0000\:0b\:00.0/ I just have driver_override and drm/. There’s nothing under driver_override.

#!/bin/sh

for i in /sys/bus/pci/devices/*/boot_vga; do
    if [ $(cat "$i") -eq 1 ]; then  # may need to change this to 0 depending on which card you want to isolate.
        GPU="${i%/boot_vga}"
        AUDIO="$(echo "$GPU" | sed -e "s/0$/1/")"
        echo "vfio-pci" > "$GPU/driver_override"
        if [ -d "$AUDIO" ]; then
            echo "vfio-pci" > "$AUDIO/driver_override"
        fi
    fi
done

modprobe -i vfio-pci

I have this run before the kernel binds to the card. Note that you may need to change the 1 to a 0 depending on which card your trying to pass through.

On Debian it’s probably going to need to be in /etc/initramfs-tools/scripts/init-top/ and you’ll need to rebuild the initramfs.

anything under it’s driver/ folder?

I get:

[email protected]:~$ ls -la /sys/bus/pci/devices/0000:09:00.0/driver/
total 0
drwxr-xr-x 2 root root 0 Jan 6 20:40 .
drwxr-xr-x 33 root root 0 Jan 6 20:40 …
lrwxrwxrwx 1 root root 0 Jan 6 21:52 0000:09:00.0 -> …/…/…/…/devices/pci0000:00/0000:00:03.1/0000:09:00.0
lrwxrwxrwx 1 root root 0 Jan 6 21:52 0000:09:00.1 -> …/…/…/…/devices/pci0000:00/0000:00:03.1/0000:09:00.1
–w------- 1 root root 4096 Jan 6 21:41 bind
lrwxrwxrwx 1 root root 0 Jan 6 21:52 module -> …/…/…/…/module/vfio_pci
–w------- 1 root root 4096 Jan 6 21:52 new_id
–w------- 1 root root 4096 Jan 6 21:52 remove_id
–w------- 1 root root 4096 Jan 6 20:40 uevent
–w------- 1 root root 4096 Jan 6 21:32 unbind

and when I switch to root, and echo the devices’ address to itself, with the driver/unbind, it removes the driver folder altogether

And then I realise you don’t have the driver folder, as the device is not currently being claimed by any driver, hence no driver in use, as shown way back when you did the lspci

I also tried:

echo "vfio-pci" > /sys/bus/pci/devices/0000\:0b\:00.0/driver_override

But that just hangs too.

I guess if

echo 0000:0b:00.0 > /sys/bus/pci/drivers/vfio-pci/bind

gave an error, and that is the address, then perhaps you might post info version numbers in case an expert happens by?

like kernel, distro, etc?

IIRC, I also have my card’s id’s in my grub command line, but I was pretty sure that was redundant, and the bind / unbind of the card works either way for me, as root.

Not sure why the commands hang, especially with the correct number of back\\slashes

oh, and sorry for presenting all the silly hoops to jump through for no benefit

Bind unbind has been working for me since f28, and I’m on f33 now, along with all the kernels along the way.

I’d recommend triple checking that nothing is using the card, like GDM or some other greeter running in the background.

Your standard console might also be bound to it.

I think that is why one would isolate the device early, so nothing else gets to bind to- and rely on access to the device.

But the old stub was superseded by the vfio, and I guess both now not needed?

I tried this in addition to everything else above, but I’m still not getting the Kernel driver in use: vfio-pci on any of this.

I did run sudo update-initramfs -u. Do I need to do anything else with GRUB?

I have the advantage I guess, of working on the Threadripper platform, with three vga cards. So the console has someplace to go during boot, and I can disable it after I’m at runlevel 3 and grab that last card if needed.

The sysfs method does not use the initramfs or need a reboot per say.

Rather than running that big script, try the basic commands so you know what you’re working with.

When you said earlier than your console “hung”, you mean the actual console, or like an ssh session?

Open two ssh sessions, and do a tail of dmesg in one, and run commands in the other. See what the kernel says after you try to ubind your first pci device. Post the results here or in a paste bin somewhere.