Hi, i have a VM with Ubuntu Server for stable diffusion, created with virt-manager. I use an nvidia a4000 and the driver is the 535 server.
It worked great for months, but today the ssd run out of space and the storage file couldn’t grow. I moved the storage file to another ssd and created the VM again. I changed the settings to use EFI and added the 2 pci host devices(gpu and gpu sound). The VM started without a problem but Torch can’t use the gpu, nvidia-smi says that it can’t communicate with the nvidia driver. I reinstalled the driver, but nothing changed.
lspci -k:
05:00.0 VGA compatible controller: NVIDIA Corporation GA104GL [RTX A4000] (rev a1)
Subsystem: NVIDIA Corporation GA104GL [RTX A4000]
Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
06:00.0 Audio device: NVIDIA Corporation GA104 High Definition Audio Controller (rev a1)
Subsystem: NVIDIA Corporation GA104 High Definition Audio Controller
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel