Single GPU passthrough on Fedora 34

BrainSweetiesss · May 11, 2021, 8:07pm

Hey guys,

I’m new here, just registered. I’ve been following L1Techs channel on Youtube for years now and finally decided to create an account now that I have a reason for it

I’ve been trying to do my first single GPU passthrough on Fedora 34 (GNOME) but I just can’t get it working. Unfortunately the guides that are available are for either: a different distro, likely Ubuntu based or Arch , or an old fashioned GPU pass with 2 GPUs.

I found an actual guide on this forum for Fedora 33/34 but it’s for a dual GPU setup too. There are some Youtube tutorials but they also weren’t any good sooooo…

I was wondering if anyone managed to achieve this? If so, can you give me any pointers? After spending about 10 hours troubleshooting this little project of mine, I’d dare to say I understand the idea behind it and what needs to happen but there are some things that are quite unclear.

I can leave some of my specs here and hopefully someone will be able to share some guide or information with me

Specs: Ryzen 3700X , 5700XT Red Dragon, MSI b450i , 32GB RAM , Scarlett 2i2 3rd Gen (maybe worth mentioning as at some point you have to hook some devices to the VM).

Looking forward to your responses! Feel free to ask away too.

HaaStyleCat · May 11, 2021, 8:31pm

I just want to clarify… Are you doing a pass through to a Virtual Machine (VM) such as windows for gaming? What are you passing through too?

-Usually you see dual GPU’s because the host OS (In your case) Fedora 34, needs a GPU to operate for you to access it, once you pass it through to a VM the host os wont be able to display anything unless you have a APU, built in motherboard Graphics Chip or another GPU.

I may be missing something… Im not a master of this, but I have made that mistake in the past.

BrainSweetiesss · May 11, 2021, 8:48pm

Hi HaaStyleCat.

My idea is to have Fedora 34 as my host and Windows 10 as my guest, for gaming mostly as you mentioned. I play Hunt Showdown and unfortunately it’s pretty much the only thing keeping me from deleting my Windows 10 partition.

I understand what you are saying. Ideally I’d have 2 GPUs (either 2 dedicated or 1 dedicated 1 iGPU) , then I’d blacklist 1 from loading in the kernel parameters in the host and pass it with KVM hooks to the guest VM later on.

What I’m trying to achieve now is running both (not at the same time) host and guest with 1 GPU as in… At the moment you start your Windows 10 VM with virt-manager, amdgpu drivers/etc are unloaded from Linux and passed into the guest (Win 10).

There are plenty of guides for this around but unfortunately none of them seem to work with my current setup (AMD GPU and CPU + Fedora 34). When searching for single GPU passes I can find info dating back to 2018 so I guess this has been a thing for a while now. I love Fedora so I wouldn’t want to have to switch to Ubuntu/Arch just for this, hence me trying to find someone with some experience at it

HaaStyleCat · May 11, 2021, 11:26pm

Dang, yeah I can’t help with that. I’m sure someone can help though.

BrainSweetiesss · May 19, 2021, 4:15pm

Well, here I am again. This time with way more details. I made some progress but still not there.

I’ll copy what I posted on reddit, hopefully someone has some insights

Specs:

AMD Ryzen 3800X
AMD Red Dragon 5700XT
Fedora 34 with kernel 5.11.20-300.fc34.x86_64
Guest: Win 10 Pro VM

Libvirt directory tree:

/etc/libvirt/hooks  tree .
.
├── qemu
└── qemu.d
    └── win10
        ├── prepare
        │   └── begin
        │       └── start.sh
        └── release
            └── end
                └── stop.sh

6 directories, 3 files

/etc/libvirt/hooks/qemu:

#!/bin/bash

GUEST_NAME="$1"
HOOK_NAME="$2"
STATE_NAME="$3"
MISC="${@:4}"

BASEDIR="$(dirname $0)"

HOOKPATH="$BASEDIR/qemu.d/$GUEST_NAME/$HOOK_NAME/$STATE_NAME"

set -e # If a script exits with an error, we should as well.

# check if it's a non-empty executable file
if [ -f "$HOOKPATH" ] && [ -s "$HOOKPATH"] && [ -x "$HOOKPATH" ]; then
    eval \"$HOOKPATH\" "$@"
elif [ -d "$HOOKPATH" ]; then
    while read file; do
        # check for null string
        if [ ! -z "$file" ]; then
          eval \"$file\" "$@"
        fi
    done <<< "$(find -L "$HOOKPATH" -maxdepth 1 -type f -executable -print;)"
fi

Start.sh hook:

#!/bin/bash
# Helpful to read output when debugging
set -x

# Stop your display manager. If youre on kde it ll be sddm.service. Gnome users should use killall gdm-x-session instead
systemctl stop display-manager.service
pipewire_pid=$(pgrep -u me pipewire)
gdm_pid=$(pgrep -u me gdm)
kill $pipewire_pid
kill $gdm_pid

# Unbind VTconsoles
echo 0 > /sys/class/vtconsole/vtcon0/bind
echo 0 > /sys/class/vtconsole/vtcon1/bind


# Avoid a race condition by waiting a couple of seconds. This can be calibrated to be shorter or longer if required for your system
sleep 4

# Unload all Radeon drivers
modprobe -r amdgpu

# Unbind the GPU from display driver
virsh nodedev-detach pci_0000_2b_00_0 < gpu
virsh nodedev-detach pci_0000_2b_00_1 < gpu audio

# Load VFIO kernel module
modprobe vfio
modprobe vfio_pci
modprobe vfio_iommu_type1

Stop.sh hook:

#!/bin/bash
# Helpful to read output when debugging
set -x

# Unload all the vfio modules
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

# Reattach the gpu
virsh nodedev-reattach pci_0000_2b_00_0   < gpu
virsh nodedev-reattach pci_0000_2b_00_1   < gpu audio

# Load all Radeon drivers
modprobe amdgpu
modprobe snd_hda_intel
modprobe ttm
modprobe drm_kms_helper
modprobe i2c_algo_bit
modprobe drm

# Start you display manager
systemctl start display-manager.service

Kernel params:

GRUB_CMDLINE_LINUX="rhgb quiet iommu=1 amd_iommu=on rd.driver.pre=vfio-pci"

After doing this and creating a Win10 VM in virt-manager, passing the GPU as a device and changing to host-passthrough I actually managed to install Windows 10… Unfortunately when I shut down the machine everything crashed and had to reboot my PC manually.

After rebooting I was logged into emergency/rescue mode due to something with initramfs and my btrfs partition being REALLY fucked. I tried to solve the issue but no no avail, I ended up formatting my PC which was really annoying.

Now… this morning, fresh install again. I did exactly the same changes to my setup and on top of that I added this:

cat vfio.conf 
add_drivers+=" vfio-pci "
 
⚡ root@fedora  /etc/dracut.conf.d  pwd
/etc/dracut.conf.d

Then ran ‘dracut -f’ to apply the changes to my current kernel version. Re-deployed a Win 10 VM and it boots fine but after the windows 10 installation is finished and the guest is about to restart, the VM (and my host) crash and I’m forced to a hard reboot. Thankfully this time the host is not entirely broken so I can continue testing.

Any ideas what’s happening? I feel I’m SO close to getting this to work but I’m failing somewhere. I’m not really knowledgeable with dracut and I’m keen on removing the vfio-pci from being automatically loaded at boot time, but I’m afraid this will get me through the installation (as it did last night) but then break my system again, forcing me to re-install and spend 2 more hours.

Win 10 VM XML config example (ignore the amount of RAM and vCPUs) :

<domain type="kvm">
  <name>win10</name>
  <uuid>64fac0d7-32d7-424b-84ad-7ed5e9177a78</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory>12288000</memory>
  <currentMemory>12288000</currentMemory>
  <vcpu>8</vcpu>
  <os>
    <type arch="x86_64" machine="q35">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/edk2/ovmf/OVMF_CODE.fd</loader>
    <bootmenu enable="yes"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
    </hyperv>
    <vmport state="off"/>
  </features>
  <cpu mode="host-passthrough"/>
  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2"/>
      <source file="/var/lib/libvirt/images/win10.qcow2"/>
      <target dev="sda" bus="sata"/>
      <boot order="4"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/home/me/Downloads/Win10_21H1_EnglishInternational_x64.iso"/>
      <target dev="sdb" bus="sata"/>
      <readonly/>
      <boot order="3"/>
    </disk>
    <controller type="usb" model="qemu-xhci" ports="15"/>
    <interface type="network">
      <source network="default"/>
      <mac address="xx:xx:xx:xx"/>
      <model type="e1000e"/>
    </interface>
    <sound model="ich9"/>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0" bus="43" slot="0" function="0"/>
      </source>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0" bus="43" slot="0" function="1"/>
      </source>
    </hostdev>
    <hostdev mode="subsystem" type="usb" managed="yes">
      <source>
        <vendor id="0x05ac"/>
        <product id="0x024f"/>
      </source>
      <boot order="1"/>
    </hostdev>
    <hostdev mode="subsystem" type="usb" managed="yes">
      <source>
        <vendor id="0x1532"/>
        <product id="0x007b"/>
      </source>
      <boot order="2"/>
    </hostdev>
  </devices>
</domain>

Thanks in advance!

The_Poot · June 25, 2021, 3:35am

Hi, i have been doing this for about a year now, i was using threadripper 1950x primary gpu pass through.
Im switching to 5950x with fedora 34 and it seems like either my system doesnt like it or fedora 34 has changed the way they do this. Im going to say that it is my system because i was able to start the vm and the display was taken from the host, but it just stayed at a blank screen. I reboot my system and then nothing happens. I will say the this is a setup error because my 1950x seemed to deal with this much better, i want the 5950x because of the perf per clock increase. I heard it was 30% from 1950x to 3950x and then another 25% from 3950x to 5950x. So perf per clock is well above 50% from 1950x to 5950x. It seems these consumer boards just dont handle ot the same as threadripper, i will try to post updates if i get more progress

BrainSweetiesss · June 25, 2021, 1:37pm

Hey @The_Poot , I actually got the single passthrough working. I wrote a guide which I’ll share in a separate thread. Maybe this helps.

If a mod sees this, please feel free to close the thread. Thanks!

The_Poot · February 2, 2022, 8:10am

I’m way too late to reply, but that sounds nice. I eventually found out why I was getting errors, when I upgraded to the 30 series cards. It seems the amd 5700xt was more lenient when it came to single gpu passthrough. The nvidia ampere cards seem like you have to do it by the books.
This is also confirmed because i used the 5700xt with the same setup and like i said it would grab the display but black out. while the nvidia card refused to do anything.
vfio was enabled on both cards and both worked fine in the 2nd slot.

The solution:
if you want single gpu passthough to work, weather its on a laptop or consumer board, you must unbind the efi framebuffer and console. These 2 things had a hold on the gpu even though vfio was enabled.