Conversion of lspci ID to /etc/modprobe.d format?

lightnb · January 7, 2021, 1:27am

I’m getting an error:

# echo "0000:0b:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
-bash: echo: write error: No such device

But:

# file /sys/bus/pci/drivers/vfio-pci/bind
/sys/bus/pci/drivers/vfio-pci/bind: writable, regular file, no read permission

It appears to be the correct PCI ID:

0b:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK104 [GeForce GTX 760] [10de:1187] (rev a1)

robbbot · January 7, 2021, 2:04am

Assuming your grub is loading the initramfs that you fired that script in there, it should be ok, but you can always pass in the ids in grub as well.

Something like

GRUB_CMDLINE_LINUX="iommu=pt amd_iommu=on vfio-pci.ids=1002:73bf,1002:ab28 video=efifb:off"

Replacing with your pci ids.

There’s some things to consider, like which slot GPU slot you’re booting from etc cause you don’t want the card to be used prior to getting that driver loaded. I have the same motherboard as you do, so you definitely can change the boot pcie if needed depending on if your card that you’re trying to pass through is in the first, second or third slot.

The video=efifb:off is sometimes necessary to make sure the GPU isn’t being used early.

gordonthree · January 7, 2021, 2:14am

What does dmesg say when the instant after you run the echo command?

lightnb · January 7, 2021, 2:19am

Nothing new appears.

emma.concrete · January 7, 2021, 2:25am

When I was trying to unbind an nvidia graphics card through the termal as root.

It also made my terminal hang. I had to wait over a solid minute to 2 minutes before I could use the terminal again (that was running on ubuntu 20.04). After those few minutes past. I could start the VM without any problem.

(edit bind/unbinding would cause it to hang)

gordonthree · January 7, 2021, 2:34am

you’re running the command as root, not through sudo right? I’ve had “sudo echo” fail a lot is all.

lightnb · January 7, 2021, 2:36am

I added to /etc/default/grub:

GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=pt amd_iommu=on vfio-pci.ids=10de:1187,10de:0e0a video=efifb:off"

Then update-grub and update-initramfs -u.

It’s still not binding the vfio-pci driver.

I have the second slot card set as default and that’s the one the BIOS, GRUB, and Debian show up on at boot. Nothing is coming out of the Nvidia card at all at this point.

lightnb · January 7, 2021, 2:36am

Yes, I’m using sudo -i to get a root prompt.

gordonthree · January 7, 2021, 2:37am

try sudo su - instead

lightnb · January 7, 2021, 2:40am

Same issue:

# echo "0000:0b:00.0" > /sys/bus/pci/drivers/vfio-pci/bind
-bash: echo: write error: No such device

Also the same with just su - and the root password.

emma.concrete · January 7, 2021, 2:51am

Correct me if I’m wrong. But with sudo lspci -Dk

0000:0b:00.0 would be the graphics card right? And not the audio device?

Because you said you were just needing to get the audio device to unbind / bind to vfio?

For my graphics card audio device it would be:
0000:0b:00.1 (If it had the same address as your motherboard has)

So isn’t the graphics card already bound to vfio? Just not the audio driver?

So with root.
echo “0000:0b:00.1” > /sys/bus/pci/devices/0000:0b:00.1/driver/unbind echo “0000:0b:00.1” > /sys/bus/pci/drivers/vfio-pci/bind

(Sorry if I misread something, lots of replies)

lightnb · January 7, 2021, 2:56am

Right now, it looks like the graphics card is not binding to anything at boot, while the audio controller is binding to snd_hda_intel

0b:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 760] (rev a1) (prog-if 00 [VGA controller])
	Subsystem: eVga.com. Corp. GK104 [GeForce GTX 760]
	Flags: bus master, fast devsel, latency 0, IRQ 187, IOMMU group 28
	Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
	Memory at e0000000 (64-bit, prefetchable) [size=128M]
	Memory at e8000000 (64-bit, prefetchable) [size=32M]
	I/O ports at f000 [size=128]
	Expansion ROM at fb000000 [virtual] [disabled] [size=512K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
--
0b:00.1 Audio device: NVIDIA Corporation GK104 HDMI Audio Controller (rev a1)
	Subsystem: eVga.com. Corp. GK104 HDMI Audio Controller
	Flags: bus master, fast devsel, latency 0, IRQ 182, IOMMU group 28
	Memory at fb080000 (32-bit, non-prefetchable) [size=16K]
	Capabilities: [60] Power Management version 3
	Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
	Capabilities: [78] Express Endpoint, MSI 00
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel

emma.concrete · January 7, 2021, 3:19am

I know when I was setting up the VFIO. Using the old way, I had to add the options to a few locations:

First:

/etc/initramfs-tools/modules

At the bottom I put:

vfio
vfio_iommu_type1
vfio_virqfd
options vfio_pci ids=10de:1187,10de:0e0a
vfio_pci ids=10de:1187,10de:0e0a
vfio_pci

Then /etc/modules
Second:

vfio
vfio_iommu_type1
vfio_pci ids=10de:1187,10de:0e0a

I don’t know if this is required but I also put it in:
/etc/modprobe.d/vfio_pci.conf

options vfio_pci ids=10de:1187,10de:0e0a

Note those id’s I replaced with your graphics card’s ID (assuming those are correct)
Plus the issue the command:
sudo update-initramfs - u
And finally rebooting

I believe I had problems where it would unbind like whats happening to you, but wouldn’t bind until I remembered to add it to /etc/modules

Sorry if that didn’t help, thats just how I’ve been doing it for ubuntu server debian server and ubuntu desktop.

lightnb · January 7, 2021, 3:44am

OK, I added it in all those places and now it looks to be loading the vfio-pci module. (The card also shows up in a Windows guest, but now I have to deal with the Code 43 issue.)

I’m wondering which places are really necessary to set, though, so I can create repeatable instructions with the minimal amount of required setup.

emma.concrete · January 7, 2021, 4:02am

Yay it bound. And yeah I honestly don’t know which is required. I suppose you could remove each one, one at a time? and see if it breaks it.

And yes, the annoying Code 43 error. I think I tried a bunch of different solutions until it actually worked for me. Seems a lot of people have to do different things to make it work depending on the card.

lightnb · January 7, 2021, 4:07am

Tried this:

<features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='123456789ab'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
    <ioapic driver='kvm'/>
  </features>

per a guide online. Still get the Code 43. Shows in device manager, just won’t load.

Tried driver update through Windows, fails with Code 43. Someone spent a lot of time breaking their own hardware to accomplish nothing. Done with computers for today I think. Thanks for all the help!

P.S. Found this in dmesg on VM startup:

vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x19@0x900

not sure if it’s an issue.

lightnb · January 8, 2021, 12:42am

Still haven’t beaten the Nvidia 43 error. Probably not going to buy any more of their cards, since this is something they broke on purpose and are wasting my time.

Does anyone know how to set the primary display adapter for a Linux guest? It seems to pick the Virtual Machine Viewer (in virt-manager) as the only/primary graphics card. The other is there in lspci from the guest), but not primary so it’s unused. What is the qemu version of the BIOS setting for “primary display adapter”?

lightnb · January 8, 2021, 5:09am

I removed all the changes to /etc/default/grub. It’s back to just:

GRUB_CMDLINE_LINUX_DEFAULT="quiet"

On update-grub, update-initramfs, reboot, I get:

Kernel driver in use: vfio-pci

On both the graphics card and the audio controller. So we’ve learned that nothing needs set in grub to get vfio-pci to bind.

When starting the VM, I get, in dmesg:

[  498.206956] vfio-pci 0000:0b:00.0: enabling device (0000 -> 0003)
[  498.207276] vfio-pci 0000:0b:00.0: vfio_ecap_init: hiding ecap 0x19@0x900

Now, it does look like there is some conflict stuff going on during host boot:

# dmesg | grep 'NVRM\|nvidia'
[    7.370277] nvidia: loading out-of-tree module taints kernel.
[    7.370283] nvidia: module license 'NVIDIA' taints kernel.
[    7.376613] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    7.387381] nvidia-nvlink: Nvlink Core is being initialized, major device number 241
[    7.387608] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[    7.387609] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[    7.387610] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[    7.387610] NVRM: No NVIDIA devices probed.
[    7.387668] nvidia-nvlink: Unregistered the Nvlink Core, major device number 241
[    7.521436] nvidia-nvlink: Nvlink Core is being initialized, major device number 240
[    7.521688] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[    7.521689] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[    7.521689] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[    7.521690] NVRM: No NVIDIA devices probed.
[    7.521754] nvidia-nvlink: Unregistered the Nvlink Core, major device number 240
[    7.535373] audit: type=1400 audit(1610081286.030:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=795 comm="apparmor_parser"
[    7.535375] audit: type=1400 audit(1610081286.030:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=795 comm="apparmor_parser"
[    7.712666] nvidia-nvlink: Nvlink Core is being initialized, major device number 240
[    7.713058] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[    7.713058] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[    7.713060] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[    7.713060] NVRM: No NVIDIA devices probed.
[    7.713165] nvidia-nvlink: Unregistered the Nvlink Core, major device number 240
[    7.851464] nvidia-nvlink: Nvlink Core is being initialized, major device number 240
[    7.851677] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[    7.851678] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[    7.851679] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[    7.851679] NVRM: No NVIDIA devices probed.
[    7.851736] nvidia-nvlink: Unregistered the Nvlink Core, major device number 240
[    8.055696] nvidia-nvlink: Nvlink Core is being initialized, major device number 240
[    8.055952] NVRM: The NVIDIA probe routine was not called for 1 device(s).
[    8.055953] NVRM: This can occur when a driver such as: 
               NVRM: nouveau, rivafb, nvidiafb or rivatv 
               NVRM: was loaded and obtained ownership of the NVIDIA device(s).
[    8.055953] NVRM: Try unloading the conflicting kernel module (and/or
               NVRM: reconfigure your kernel without the conflicting
               NVRM: driver(s)), then try loading the NVIDIA kernel module
               NVRM: again.
[    8.055953] NVRM: No NVIDIA devices probed.
[    8.056020] nvidia-nvlink: Unregistered the Nvlink Core, major device number 240

Looks like the same thing repeats five times. I wonder if specifying in multiple places is causing issues. Or maybe it’s something else?

emma.concrete · January 8, 2021, 10:45pm

You can keep the vfio stuff out of the grub default file.

But for a vm to work properly you should have

GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=1 intel_iommu=on"

or the amd version if you have an amd processor.

For the error 43, I would recommend getting the nvidia drivers from nvidia itself, and not using the windows update or other options. I’ve had problems having it start without it.

How I got past the code 43 I had this code at the very bottom of my xml file:

  <qemu:commandline>
    <qemu:arg value='-device'/>
    <qemu:arg value='ioh3420,bus=pci.0,addr=1b.0,multifunction=on,p>
    <qemu:arg value='-cpu'/>
    <qemu:arg value='host,hv_time,kvm=off,hv_vendor_id=whatever,-hy>
    <qemu:env name='QEMU_AUDIO_DRV' value='pa'/>
  </qemu:commandline>

And had in feature

  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <vmport state='off'/>
  </features>

Having that doesn’t mean you will have no problems passing the card through. But I do again highly recommend using the nvidia drivers for it.

Some cards have problems and you have to extract the bios and add a romfile for the vm. All of which are frustrating to deal with. I had very few problems getting my 1660 super to work in the VM myself tho.

Trooper_ish · January 8, 2021, 10:47pm

Ss well as Emma’s suggestions, make sure other graphics/spice clients for video are removed from the list in vmm maybe?