I had a single-GPU passthrough setup with my 3090 back when Apex Legends didn’t work on Linux, and I used it for about a year. Once Apex enabled EAC Proton compatibility I stopped using it, and I guess the config rotted because there had been over a year of QEMU/libvirt updates, and when I went to try and boot it again, it wouldn’t boot, I just got the stupid single-GPU passthrough error of the GUI session dying and the VM not starting, with no error messages/logs to look through.
So, instead of trying to figure out what the hell might have broken, I decided to just start over, since it’s not that difficult (at least it didn’t use to be difficult).
I created the VM in virt-manager, added the installation iso and virtio-win drivers, and I installed the Win10 OS into the VM, then rebooted the VM and installed the virtio-win drivers.
So then it came time to actually attach the GPU to the VM in virt-manager and add the hook scripts in /etc/libvirt/hooks/qemu.d
and configure my kvm.conf
and all that.
And I kept getting the same freeze, and I’d have to ssh in and reboot (or press Ctrl+Alt+Del a bunch of times to force a reboot). When SSH-ing in or checking the previous boot’s logs on reboot in journalctl
, I found that the only libvirt error was that I was getting an Input/output error
when trying to start the VM.
So, I went into my start.sh
hook script and added lines to each command to log to a text file so I would know at which step I was getting hung up.
And it turns out, it’s the virsh nodedev-detach
lines. And so I rebooted, then went to a separate TTY, and tried running everything manually, and sure enough those commands just hang. I found out that the nvidia
module is not being removed. So I tried to FORCE remove it, and it refuses to unload. It says that it’s in use. But it’s not in use.
I have noticed the past couple months that now, even when I SSH into my machine, I get a stupid the control display is undefined; please run 'nvidia-settings --help' for usage information
in my zsh prompt when it first starts up, so even the shell seems to be using nvidia-settings, but I can’t find ANY other reason why the nvidia module refuses to unload.
It refuses to unload even after sddm
has been completely killed. I have no Xorg servers running (and definitely no Wayland ones), I can boot up, go to a new tty, run sudo systemctl stop sddm.service
and then try to unload the different Nvidia modules, and when I finally get to the final one, nvidia
, it refuses to unload and says it’s in use.
lsmod | grep nvidia
shows that no other modules are depending on that module, and yet I get this in journalctl
:
Jun 12 16:07:04 matt-archlinux kernel: NVRM: Attempting to remove device 0000:0f:00.0 with non-zero usage count!
What the hell is going on?
I’ve edited my mkinitcpio.conf
to remove nvidia
from the modules array, and regenerated the initramfs.
Can anyone point me in the right direction?
Arch Linux, KDE Plasma is my default DE if that matters (though it shouldn’t because this happens even when I haven’t logged into a GUI at all yet).