Need help with dynamically binding and un-binding Nvidia GPU

Tree_McGee · March 2, 2020, 7:20am

So what you’re telling me to do is follow your guide until I get to point where I create bind_vfio.sh. Inside bind_vfio.sh it should look like this?

#!/bin/bash

## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

## VGA Controller: unbind nvidia and bind vfio-pci
echo '0000:0f:00.0' > /sys/bus/pci/devices/0000:0f:00.0/driver/unbind
echo '10de 100a'    > /sys/bus/pci/drivers/vfio-pci/new_id
echo '0000:0f:00.0' > /sys/bus/pci/devices/0000:0f:00.0/driver/bind
echo '10de 100a'    > /sys/bus/pci/drivers/vfio-pci/remove_id

## Audio Controller: unbind snd_hda_intel and bind vfio-pci 
echo '0000:0f:00.1' > /sys/bus/pci/devices/0000:0f:00.1/driver/unbind
echo '10de 0e1a'    > /sys/bus/pci/drivers/vfio-pci/new_id
echo '0000:0f:00.1' > /sys/bus/pci/devices/0000:0f:00.1/driver/bind
echo '10de 0e1a'    > /sys/bus/pci/drivers/vfio-pci/remove_id

Do I need to do something like this
echo '10de 100a' > /sys/bus/pci/drivers/nvidia/remove_id
before binding to VFIO or does Linux do a smart thing where a device can only have it’s ID bound to one driver and thus removes the GPU from the nvidia driver ID list when I add it to VFIO?

and for unbind_vfio.sh?

#!/bin/bash

## VGA Controller: unbind vfio-pci and bind nvidia
echo '0000:0f:00.0' > /sys/bus/pci/devices/0000:0f:00.0/driver/unbind
echo '10de 100a'    > /sys/bus/pci/drivers/nvidia/new_id
echo '0000:0f:00.0' > /sys/bus/pci/devices/0000:0f:00.0/driver/bind
echo '10de 0e1a'    > /sys/bus/pci/drivers/nvidia/remove_id

## Audio Controller: unbind vfio-pci and bind snd_hda_intel 
echo '0000:0f:00.1' > /sys/bus/pci/devices/0000:0f:00.1/driver/unbind
echo '10de 0e1a'    > /sys/bus/pci/drivers/snd_hda_intel/new_id
echo '0000:0f:00.1' > /sys/bus/pci/devices/0000:0f:00.1/driver/bind
echo '10de 0e1a'    > /sys/bus/pci/drivers/snd_hda_intel/remove_id

## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

Also, from the looks of it, this setup wouldn’t work with identical GPU models or GPUs with identical controllers (E.G. Audio, Serial, etc) due to the use of identifiers like “10de 100a” and “10de 0e1a”.

In that case, wouldn’t it be better to run something like this for the bind_vfio.sh? as suggested by gordanthree (I’m concerned about this issue as I believe Nvidia may not change their Virtual Link controllers with next gen RTX. And considering I may be upgrading to a RTX 2060 Super and RTX ??70 I really want to make sure my method accounts for this)

#!/bin/bash

## Load vfio
modprobe vfio
modprobe vfio_iommu_type1
modprobe vfio_pci

## Unbind VGA and audio from "standard" drivers
echo 0000:0f:00.0 > /sys/bus/pci/devices/0000:0f:00.0/driver/unbind
echo 0000:0f:00.1 > /sys/bus/pci/devices/0000:0f:00.1/driver/unbind

##Bind VGA and audio to vfio-pci
echo 0000:0f:00.0 > /sys/bus/pci/drivers/vfio-pci/bind
echo 0000:0f:00.1 > /sys/bus/pci/drivers/vfio-pci/bind

Then this for unbind_vfio.sh?

#!/bin/bash

##Unbind VGA and audio to vfio-pci
echo 0000:0f:00.0 > /sys/bus/pci/devices/0000:0f:00.0/driver/unbind
echo 0000:0f:00.1 > /sys/bus/pci/devices/0000:0f:00.1/driver/unbind

##Bind VGA and audio to "standard" drivers
echo 0000:0f:00.0 > /sys/bus/pci/drivers/nvidia/bind
echo 0000:0f:00.1 > /sys/bus/pci/drivers/snd_hda_intel/bind

## Unload vfio
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

Sorry. Just trying to wrap my head around all of this. It’s kind of hard for me to picture how all this works when I’m not well versed in VFIO and when I don’t have the hardware on hand to test with.