Hello.
I have a newly built PC and have been working on setting up GPU passthrough for the first time - so I am a beginner. I’ve managed to get a basic passthrough setup working, by binding the guest GPU to vfio-pci
on boot. Using the suspend-to-RAM trick, I am able to work around the AMD reset bug and launch the VM multiple times. So far, so good.
However, I don’t really want to have my guest GPU dedicated exclusively to the Windows guest. I want to be able to hot-swap it between the host and guest, making use of it from the host via DRI_PRIME=1
. This means I need to be able to bind/unbind it to/from amdgpu
. And I have been having a nightmare of a time trying to accomplish this. I can bind and unbind from vfio-pci
just fine with no problems. I can bind to amdgpu
just fine with no problems. But attempting to unbind from amdgpu
always hangs with the command never returning, and I either get a kernel panic or just become unable to properly shut down the PC.
My system info:
- Distro: Fedora Silverblue
- Kernel: 5.3.16-300.fc31.x86_64
- DE: GNOME with Wayland
- Motherboard: MSI X570 Gaming Pro Carbon
- Host GPU: AMD Radeon RX 550
- Guest GPU: AMD Radeon RX 5700 XT
- IOMMU groups: Both GPUs, and their HDMI audio components, are all in separate IOMMU groups (i.e. four separate groups).
My current setup:
- I am using libvirt with virsh and Virtual Machine Manager.
- I’ve followed the basic steps to set up a VM, pass through an NVMe drive, route audio to PulseAudio, use evdev passthrough for the keyboard and mouse, and am passing through my guest GPU and its HDMI audio component.
- The guest GPU and its HDMI audio component are both using
vfio-pci
from boot, accomplished via kernel arguments.
What I’ve tried to get hot-swap working:
- I’ve tried allowing the system to boot with
amdgpu
, and then simply launch the VM to have it automatically switch the driver over. This causes a kernel panic immediately. - I’ve tried booting the system with
vfio-pci
, and then manually switching over toamdgpu
. Upon doing so, the system detects the new graphics card and it becomes usable with DRI_PRIME=1. Usinglsof
I can see that both GPUs are in use bysystemd
,systemd-logind
, andgnome-shell
. However, attempting to unbind fromamdgpu
by echoing1
to/sys/bus/pci/devices/ID/driver/unbind
causes the command to hang. I am not even able to cancel/kill it. It just hangs forever and prevents system shutdown. - I’ve tried switching to a tty, killing GNOME Shell and GDM entirely, and then manually swapping from
vfio-pci
toamdgpu
. This time,lsof
shows that nothing is using the guest GPU. Even so, attempting to unbind fromamdgpu
still causes the command to hang forever, and also prevents system shutdown. - I’ve tried switching between both Xorg and Wayland. It works even worse in Xorg, immediately freezing the display when attempting to unbind the guest GPU. At least in Wayland the display will continue to work, even if I can no longer shut down the PC.
- I’ve tried using
/remove
instead of/unbind
, and there’s no difference there - I assume it tries to unbind first anyway so this is probably to be expected. - I’ve tried suspending to RAM after the unbind attempt. No difference.
- I’ve probably tried other things that I’m forgetting to mention here.
I’d greatly appreciate any help you guys could offer. Thanks.