Linux NVIDIA drivers not grabbing all cards

Hello,

I have a workstation running PopOS 20.04 with 4 different NVIDIA GPUs, along with onboard VGA graphics from ASPEED 2500. I have been booting linux into the GUI using the onboard VGA and binding all 4 NVIDIA gpus to vfio-pci for passthrough purposes, and that is working fine - I can passthrough whichever card I like to a Windows guest.

Now I would like to use one of my GPUs (Titan Black) for the linux host GUI. When I removed that card from the list of the vfio_pci.ids, the NVIDIA driver would not bind to it - it was left unbound to any driver. As an experiment, I removed all my GPUs from the vfio_pci.ids list. I found that two of my cards (a GT730 and a Quadro P4000) were bound to nvidia, while the other two (a GTX 1060 and the Titan Black) were left unbound.

Has anyone encountered a similar issue? I’m pretty new to linux, so not sure what logs I should post to provide more information, but want to learn and will follow suggestions.

Thanks in advance. Here are some key parameters of my workstation:

OS: Pop!_OS 20.04 LTS, kernel 5.11.0-7614-generic
MB: Supermicro X11SRA-F
CPU: Intel Xeon W-2175
Graphics:

  1. AST2500
  2. GTX 1060 6GB
  3. GT 730
  4. GTX Titan Black
  5. Quadro P4000

Nvidia driver: 460.73.01

lspci of unbound GPU (note that there is no entry for “Kernel driver in use”):
65:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK110B [GeForce GTX TITAN Black] [10de:100c] (rev a1) (prog -if 00 [VGA controller]) Subsystem: NVIDIA Corporation GK110B [GeForce GTX TITAN Black] [10de:1066] Flags: fast devsel, IRQ 11, NUMA node 0 Memory at d2000000 (32-bit, non-prefetchable) [disabled] [size=16M] Memory at c8000000 (64-bit, prefetchable) [disabled] [size=128M] Memory at d0000000 (64-bit, prefetchable) [disabled] [size=32M] I/O ports at b000 [disabled] [size=128] Expansion ROM at d3000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [128] Power Budgeting <?> Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] Secondary PCI Express Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

Do you have the proprietary driver installed? If not, you will need it because Nouveau will not work with the Titan.

**Edit

Ah. nevermind me. I am not a nVidia user. Wertigon is probably right on this one.

This use case might be too specific for most of us Linux veterans. I think you might need to bite the bullet and ask Nvidia about this one, to be honest. :man_shrugging:

Thanks for the replies. I changed my approach and came up with a workaround. I left the linux host as is using the integrated graphics, and then passed through the Titan Black to a linux guest (also running Pop_OS). The NVIDIA driver worked there, although I had to hide the hypervisor from the linux NVIDIA driver just like one has to do for the Windows driver to avoid code 43.

I agree the next step would have been to reach out to NVIDIA support, but since my use case was so unique, I was not sure it would have been resolved easily.

Really? Shame. So the Code 43 fix is exclusive to the Windows drivers for now. No Linux love.