hi all,
Im running the latest proxmox with all non-paid updates on a new gigabyte 2U box, dual milan CPUs and NVIDIA GPUs (A40 RTX).
First time experience with AMD and proxmox!!! I run my there dual cpu/gpu systems with Unraid and Xeon CPUs. Passthrough was always running ok.
On proxmox I have done the usual “tutorial” steps:
PROXMOX:
GRUB_CMDLINE_LINUX_DEFAULT=“quiet amd_iommu=on iommu=pt pcie_acs_override=downstream,multifunction nofb nomodeset video=vesafb:off,efifb:off”
01:00.0 3D controller: NVIDIA Corporation GA102GL [A40] (rev a1)
Subsystem: NVIDIA Corporation Device 145a
Flags: fast devsel, IRQ 542, NUMA node 3
Memory at f8000000 (32-bit, non-prefetchable) [disabled] [size=16M]
Memory at 46000000000 (64-bit, prefetchable) [disabled] [size=64G]
Memory at 48040000000 (64-bit, prefetchable) [disabled] [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] #00 [0080]
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [c8] MSI-X: Enable- Count=6 Masked-
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] #19
Capabilities: [bb0] #15
Capabilities: [bcc] Single Root I/O Virtualization (SR-IOV)
Capabilities: [c14] Alternative Routing-ID Interpretation (ARI)
Capabilities: [c1c] #26
Capabilities: [d00] #27
Capabilities: [e00] #25
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveauecho “options vfio_iommu_type1 allow_unsafe_interrupts=1” > /etc/modprobe.d/iommu_unsafe_interrupts.conf
echo “options kvm ignore_msrs=1” > /etc/modprobe.d/kvm.conf
echo “blacklist radeon” >> /etc/modprobe.d/blacklist.conf
echo “blacklist nouveau” >> /etc/modprobe.d/blacklist.conf
echo “blacklist nvidia” >> /etc/modprobe.d/blacklist.conf
echo “options vfio-pci ids=10de:1b81,10de:10f0 disable_vga=1”> /etc/modprobe.d/vfio.conf
update-initramfs -u
I build VMs with many ubuntu versions 2004LTS, 2104 and both server versions but nvidia-smi never works…
Any ideas?
I have tried many “posts” that apt purge and install things… maybe not the correct one though.
Does proxmox keeps some elements of the GPU from the VM?
I like proxmox a lot and I would like to make it work!
k@u2104serv01:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
k@u2104serv01:~$ sudo lspci -s 01:00 -v
01:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A40] (rev a1)
Subsystem: NVIDIA Corporation Device 145a
Physical Slot: 0
Flags: fast devsel, IRQ 16
Memory at (32-bit, non-prefetchable) [disabled]
Memory at (64-bit, prefetchable) [disabled]
Memory at (64-bit, prefetchable) [disabled]
Capabilities: [60] Power Management version 3
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [b4] Vendor Specific Information: Len=14 <?>
Capabilities: [c8] MSI-X: Enable- Count=6 Masked-
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidiak@u2104serv01:~$ lspci -n -s 01:00
01:00.0 0302: 10de:2235 (rev a1)
k@u2104serv01:~$ grep nvidia /etc/modprobe.d/* /lib/modprobe.d/*
/etc/modprobe.d/blacklist-framebuffer.conf:blacklist nvidiafb
k@u2104serv01:~$ grep nouv /etc/modprobe.d/* /lib/modprobe.d/*
/lib/modprobe.d/nvidia-graphics-drivers.conf:blacklist nouveau
/lib/modprobe.d/nvidia-graphics-drivers.conf:blacklist lbm-nouveau
/lib/modprobe.d/nvidia-graphics-drivers.conf:alias nouveau off
/lib/modprobe.d/nvidia-graphics-drivers.conf:alias lbm-nouveau off
k@u2104serv01:~$ sudo modprobe nvidia
modprobe: ERROR: could not insert ‘nvidia’: No such device