I’ve watch this recent video and was very confused about the ‘hot swapping that just work under linux’
From reading about VFIO and Looking glass, something i really want to put in place for the few game and software needing GPU that don’t run under wine/proton, i’ve learned a lot.
I have a dual GPU setup, the main one is AMD, with a 3080TI i planed to give to the VM.
But as far as i know, they wasn’t a way to hot-unload an NVIDIA Gpu, you had to blacklist the pci device before boot.
This is very tedious, because i use my nvidia gpu for AI test and to offload some stuff under linux most of the time, and i really didn’t want to reboot the computer and select the special grub entry to then do VFIO … at this rate it’s simpler to reboot under windows.
So given this video, what changed ? What new magic allow you to hotswap nvidia gpu under linux ?
I would love to have this now relatively “old” setup run it’s intended purpose
sadly echo "0000:0d:00.0" > /sys/bus/pci/devices/0000:0d:00.0/driver/unbind hang forever, with a standard NVRM: Attempting to remove device 0000:0d:00.0 with non-zero usage count! log in kernel, and the driver folder vanish …
I have this working with an AMD iGPU + 3060. The trick is to bind to vfio on boot, then rebind to NVIDIA after starting the DE or wm. Unless the DE supports gpu hotswap, that’s the only way. Otherwise the DE latches onto the GPU and won’t let go. But you can use prime offload and obviously compute without the DE using the gpu.
Also see this solution, that I took some parts from. It is written for KDE and I’ve had to adapt a bit to gnome and hyprland:
With fedora gnome I have it working when only binding to NVIDIA after starting gnome… it’s a bit annoying but it works.
This part is important though (if I don’t do this gnome WILL spontaneously bind to the gpu but it won’t let go anymore):
If you don’t want stuff binding to /dev/nvidia0, you can set the __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/50_mesa.json envvar in /etc/environment aswell, to exclude the nvidia EGL files, since it bypasses KWIN Study this image carefully. Just don’t forget setting it to the nvidia file, in case of games.
Coming back to this because i had 2 free hours to spend on it … and sadly i’m still stuck at the same point.
I’ve followed @quilt pointer, but maybe this isn’t fully compatible with gnome ?
My gpu is correctly bounded to vfio at boot (i get a “missing nvidia driver, falling back to nouveau” on plymouth) and my display is “off”.
If i disable all vfio modules, and start the nvidia one, it’s immediately picked up and added to the gnome desktop.
Nothing is bound to /dev/nvidia0
lsof /dev/nvidia0
lsof: status error on /dev/nvidia0: No such file or directory
yet, i still can’t unload nvidia_drm echo -n "remove" > /sys/bus/pci/devices/0000:0d:00.0/drm/card0/uevent does turn off the display, but that seem to be a kde trick and it doesn’t do enough for gnome echo "0000:0d:00.0" > /sys/bus/pci/devices/0000:0d:00.0/driver/unbind still hang with NVRM: Attempting to remove device 0000:0d:00.0 with non-zero usage count!
~# rmmod nvidia_drm
rmmod: ERROR: Module nvidia_drm is in use
~# lsmod | grep nvidia
nvidia_drm 135168 3
nvidia_modeset 1650688 1 nvidia_drm
nvidia_uvm 6844416 0
nvidia 72577024 2 nvidia_uvm,nvidia_modeset
video 81920 3 amdgpu,nouveau,nvidia_modeset
And nvidia-smi don’t return any processes … but showing the display setting of gnome still show the one on the nvidia gpu
I think i just don’t have enough free time for what is basically unsupported by gnome, hot gpu unplug …
I wish Wendell had a magic bullet in that video.