Help with GPU passthrough, linux noob

I’m trying to pass through my V100 32GB but when I update my grub it gives an error
grub


GRUB_DEFAULT=0

GRUB_TIMEOUT_STYLE=hidden
GRUB_TIMEOUT=0
GRUB_DISTRIBUTOR=lsb_release -i -s 2> /dev/null || echo Debian
GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash” vfio-pci.ids=10de:1db6
GRUB_CMDLINE_LINUX=“”


sudo update-grub
Sourcing file `/etc/default/grub’
/usr/sbin/grub-mkconfig: 11: /etc/default/grub: vfio-pci.ids=10de:1db6: not found


iommu group
10:00.0 3D controller [0302]: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] [10de:1db6] (rev a1)
Subsystem: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] [10de:124a]
Flags: bus master, fast devsel, latency 0, IRQ 78, IOMMU group 10
Memory at fb000000 (32-bit, non-prefetchable) [size=16M]
Memory at 7000000000 (64-bit, prefetchable) [size=32G]
Memory at 7800000000 (64-bit, prefetchable) [size=32M]
Capabilities:
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau

I needed to move the " from after the quiet splash to after the end of the string

I am now on this part of the guide
it has a spot for devices, since my tesla doesn’t have a audio controller should I just delete the extra spot and put the one?


#!/bin/sh

PREREQ=""

prereqs()
{
   echo "$PREREQ"
}

case $1 in
prereqs)
   prereqs
   exit 0
   ;;
esac

for dev in 0000:0c:00.0 0000:0c:00.1 
do 
 echo "vfio-pci" > /sys/bus/pci/devices/$dev/driver_override 
 echo "$dev" > /sys/bus/pci/drivers/vfio-pci/bind 
done

exit 0

In my case should I put 1db6:10.00.0
or something else

it still shows up when I do lspci -nnv so I guess that’s not right
I already rebooted

did
sudo chmod +x /etc/initramfs-tools/scripts/init-top/vfio.sh
and then
sudo update-initramfs -u -k all
rebooted and still not isolated

lspci -vv

10:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB] (rev a1)
Subsystem: NVIDIA Corporation GV100GL [Tesla V100 PCIe 32GB]
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 255
IOMMU group: 10
Region 0: Memory at fb000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Region 1: Memory at 7000000000 (64-bit, prefetchable) [disabled] [size=32G]
Region 3: Memory at 7800000000 (64-bit, prefetchable) [disabled] [size=32M]
Capabilities: [60] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [78] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 75.000W
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 8GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range AB, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp- 10BitTagReq- OBFF Via message, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn-
LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [250 v1] Latency Tolerance Reporting
Max snoop latency: 0ns
Max no snoop latency: 0ns
Capabilities: [258 v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
PortCommonModeRestoreTime=255us PortTPowerOnTime=10us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
T_CommonMode=0us LTR1.2_Threshold=0ns
L1SubCtl2: T_PwrOn=10us
Capabilities: [128 v1] Power Budgeting <?> Capabilities: [420 v2] Advanced Error Reporting UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn- MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap- HeaderLog: 00000000 00000000 00000000 00000000 Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [ac0 v1] Designated Vendor-Specific: Vendor=10de ID=0001 Rev=1 Len=12 <?>
Kernel modules: nvidiafb, nouveau

since it doesn’t have “Kernel driver in use:” that means its not being held up by nouveau but its also not bound to the VIFI driver

any pointers?

My Arch guide has driver soft-dep listings here: Arch Passthrough End to End | The Complete Field Manual

Basically, some drivers can’t be unbound, so the kernel will bind a driver, but then it’ll stick. What you want to do is set vfio to be a pre-load requirement for the GPU drivers, so vfio loads and grabs your device before nvidia/nouveau even has a chance to initialize.

I wouldnd do it the way you try. I assume you are on Debian 12 (based on your Grub config)

  • Enable virtualization features in Bios
  • enable IOMMU by adding “intel_iommu=on/amd_iommu=on” to your grub config
  • Update Grub
  • create a configfile with the PCI-IDs of the gpu in /etc/modprobe.d/
  • add necessary vfio modules to your /etc/modules file
  • regenerate your initramfs
  • reboot

You get the error in Grub because all your arguments have to be within the quotation marks.
On a modern gpu there are usually more than one device. Its the gpu, a audiodevice and a usbhub.
If you do a “lscpi -nnk” and you search for your gpu in the output, you should see something like this:

25:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23 [Radeon RX 6600/6600 XT/6600M] [1002:73ff] (rev c1)
Subsystem: Micro-Star International Co., Ltd. [MSI] MSI RX 6600XT MECH 2X [1462:5021]
Kernel driver in use: vfio-pci
Kernel modules: amdgpu
25:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21/23 HDMI/DP Audio Controller [1002:ab28]
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel

My RX 6000 XT has of 3 PCI-IDs and all three need to be in the configfile.