Single GPU Passthrough - Nvidia Driver REFUSES to Unload [module is in use]

I had a single-GPU passthrough setup with my 3090 back when Apex Legends didn’t work on Linux, and I used it for about a year. Once Apex enabled EAC Proton compatibility I stopped using it, and I guess the config rotted because there had been over a year of QEMU/libvirt updates, and when I went to try and boot it again, it wouldn’t boot, I just got the stupid single-GPU passthrough error of the GUI session dying and the VM not starting, with no error messages/logs to look through.

So, instead of trying to figure out what the hell might have broken, I decided to just start over, since it’s not that difficult (at least it didn’t use to be difficult).

I created the VM in virt-manager, added the installation iso and virtio-win drivers, and I installed the Win10 OS into the VM, then rebooted the VM and installed the virtio-win drivers.

So then it came time to actually attach the GPU to the VM in virt-manager and add the hook scripts in /etc/libvirt/hooks/qemu.d and configure my kvm.conf and all that.

And I kept getting the same freeze, and I’d have to ssh in and reboot (or press Ctrl+Alt+Del a bunch of times to force a reboot). When SSH-ing in or checking the previous boot’s logs on reboot in journalctl, I found that the only libvirt error was that I was getting an Input/output error when trying to start the VM.

So, I went into my start.sh hook script and added lines to each command to log to a text file so I would know at which step I was getting hung up.

And it turns out, it’s the virsh nodedev-detach lines. And so I rebooted, then went to a separate TTY, and tried running everything manually, and sure enough those commands just hang. I found out that the nvidia module is not being removed. So I tried to FORCE remove it, and it refuses to unload. It says that it’s in use. But it’s not in use.

I have noticed the past couple months that now, even when I SSH into my machine, I get a stupid the control display is undefined; please run 'nvidia-settings --help' for usage information in my zsh prompt when it first starts up, so even the shell seems to be using nvidia-settings, but I can’t find ANY other reason why the nvidia module refuses to unload.

It refuses to unload even after sddm has been completely killed. I have no Xorg servers running (and definitely no Wayland ones), I can boot up, go to a new tty, run sudo systemctl stop sddm.service and then try to unload the different Nvidia modules, and when I finally get to the final one, nvidia, it refuses to unload and says it’s in use.

lsmod | grep nvidia shows that no other modules are depending on that module, and yet I get this in journalctl:

Jun 12 16:07:04 matt-archlinux kernel: NVRM: Attempting to remove device 0000:0f:00.0 with non-zero usage count!

What the hell is going on?

I’ve edited my mkinitcpio.conf to remove nvidia from the modules array, and regenerated the initramfs.

Can anyone point me in the right direction?

Arch Linux, KDE Plasma is my default DE if that matters (though it shouldn’t because this happens even when I haven’t logged into a GUI at all yet).

Hey, i also had this issue, i fixed it by unloading nvidia_drm and nvidia_uvm before detaching, heres my script

#!/bin/bash
# Helpful to read output when debugging
set -x

systemctl stop sddm.service

# Stop display manager
killall sddm-x11-session
killall plasmashell
killall Xwayland
killall kwin_wayland
killall startplasma-wayland
killall startplasma-waylandsession


# Unbind VTconsoles
echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind

# Unbind EFI-Framebuffer
echo efi-framebuffer.0 > /sys/bus/platform/drivers/efi-framebuffer/unbind

# Avoid a Race condition by waiting 2 seconds. This can be calibrated to be shorter or longer if required for your system
sleep 2

modprobe -r nvidia-drm
modprobe -r nvidia-uvm
modprobe -r snd_hda_intel
modprobe -r i2c_nvidia_gpu

sleep 2

# Unbind the GPU from display driver
virsh nodedev-detach pci_0000_10_00_0
virsh nodedev-detach pci_0000_10_00_1
virsh nodedev-detach pci_0000_10_00_2 
virsh nodedev-detach pci_0000_10_00_3 


# Load VFIO Kernel Module  
modprobe vfio-pci