VFIO in 2019 – Fedora Workstation (General Guide though) [Branch, Draft]

add99c9d0f · January 6, 2020, 11:28pm

In case it helps anyone - I’ll just note down here some differences in the required commands for Fedora Silverblue users, compared to the guide’s Fedora Workstation commands.

Workstation:

sudo dnf install @virtualization

Silverblue:

rpm-ostree install virt-install libvirt-daemon-config-network libvirt-daemon-kvm qemu-kvm virt-manager virt-viewer

Workstation:

sudo lsinitrd | grep vfio

Silverblue:

That command just didn’t work, as it can’t find the initramfs. I’m not sure what the equivalent command would be. I just skipped that.

Workstation:

sudo dracut --add-drivers "vfio vfio-pci vfio_iommu_type1" --force
sudo dracut --force

Silverblue:

rpm-ostree initramfs --enable --arg=--add-drivers --arg="vfio vfio-pci vfio_iommu_type1"

Workstation:

Adding the intel_iommu=on or amd_iommu=on into GRUB_CMDLINE_LINUX
sudo vim /etc/sysconfig/grub
sudo dnf reinstall kernel
grub2-editenv list

Silverblue:

rpm-ostree kargs --editor
And then add amd_iommu=on iommu=pt using the editor.

Finally, instead of the /usr/sbin/vfio-pci-override.sh script, I added vfio-pci.ids=vvvv:dddd,vvvv:dddd using rpm-ostree kargs --editor, as above.

Workstation:

lstopo is a utility you can
apt install hwloc

Silverblue:

toolbox create
toolbox enter
sudo dnf install hwloc hwloc-gui
lstopo

add99c9d0f · January 8, 2020, 5:55am

I’m a bit confused with the instructions on checking topology with hwloc. I’ve installed it and then run it via lstopo. This is the output it gives me:

As you can see, it just shows me one single MemoryModule/node. My guest GPU is PCI 2f:00.0 near the bottom right corner. What am I supposed to learn from this?

What cores would be best for the GPU then - or are they just all equivalent on this CPU (3900X)?
What do the numbers on the lines mean (e.g. 32 on the line coming off the guest GPU)?
Would it be better to pass through 9 cores instead of my intended 8 cores, to fully share the L3 cache?

chase9 · January 8, 2020, 3:29pm

From what I understand, this is less about your GPU and more about maximizing CPU performance. What you’re supposed to learn here is how your cores are numbered so that you can pass through threads which share a core, cores which share cache, etc. The idea is that the less jumping around the die, the better.

I think that anything would be fine for your GPU.
I’m not sure, but I would guess it’s IOMMU groups
You’re fine not saturating the L3 cache.

I would use cores 0-5, 12-17. This will neatly split your guest and host with 12 threads each, and no overlap on cache.

Hopefully this all makes sense, it can be a lot to take in!

ElGuy_ARc · January 22, 2020, 1:31am

thanks guys i finally got my working on Fedora 31 Ryzen 3900x

anon13406796 · January 22, 2020, 11:00am

Hey there, wondering if I have a couple of things set right with regards to CPU pinning and isolation and perhaps those hyperv options too.
My lstopo:

Here’s a portion of my XML:

  <memory unit="KiB">16777216</memory>
  <currentMemory unit="KiB">16777216</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <vcpu placement="static">10</vcpu>
  <cputune>
    <vcpupin vcpu="0" cpuset="1"/>
    <vcpupin vcpu="1" cpuset="7"/>
    <vcpupin vcpu="2" cpuset="2"/>
    <vcpupin vcpu="3" cpuset="8"/>
    <vcpupin vcpu="4" cpuset="3"/>
    <vcpupin vcpu="5" cpuset="9"/>
    <vcpupin vcpu="6" cpuset="4"/>
    <vcpupin vcpu="7" cpuset="10"/>
    <vcpupin vcpu="8" cpuset="5"/>
    <vcpupin vcpu="9" cpuset="11"/>
  </cputune>
  <os>
    <type arch="x86_64" machine="pc-q35-4.1">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/edk2/ovmf/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vpindex state="on"/>
      <runtime state="on"/>
      <synic state="on"/>
      <stimer state="on"/>
      <reset state="on"/>
      <vendor_id state="on" value="whatever"/>
      <frequencies state="on"/>
      <reenlightenment state="on"/>
      <tlbflush state="on"/>
      <ipi state="off"/>
      <evmcs state="off"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>

and the portion of my grub file:

intel_pstate=passive pcie_aspm=off intel_iommu=on isolcpus=1-5,7-11 nohz_full=1-5,7-11 rcu_nocbs==1-5,7-11

What I have attempted is to keep core 0 and its sibling thread for the host while the guest gets the rest of the chip.

JDawg · February 11, 2020, 8:02am

These instructions are incomplete and will not work without an additional step.

There is nothing in here causing the script /usr/sbin/vfio-pci-override.sh to execute. To do that, you need to create a file like the following (you see it in the lsinitrd dump in the post):

Open an editor using: sudo vi /etc/modprobe.d/vfio.conf, and paste the following into the file:

install vfio-pci /usr/sbin/vfio-pci-override.sh; /sbin/modprobe --ignore-install vfio-pci

options vfio-pci disable_vga=1

Create that file and include it in your initial ram disk (initram), using dracut, as described in the post.

Also note, this step from the beginning of the post is for intel CPUs:
dmesg | grep -i -e IOMMU | grep enabled
which should show a line of text with the word “enabled” in it.
For AMD CPUs, this is the command:
dmesg | grep -i -e IOMMU | grep "counters supported"
and you should see some text like “AMD-Vi: IOMMU performance counters supported”

I also used this line in my /etc/dracut.conf.d/vfio.conf file:

force_drivers+="vfio vfio-pci vfio_virqfd vfio_iommu_type1"

I don’t know if the added vfio_virqfd is helpful or not.

In the variable GRUB_CMDLINE_LINUX in /etc/default/grub, I appended these commands:

amd_iommu=on rd.driver.pre=vfio-pci

but other guides also recommend also adding iommu=pt to the list once everything is working, which can provide better performance.

I hope this helps someone! FWIW I’m using Fedora 31.

JDawg · February 11, 2020, 8:35am

See my reply on Feb 11.

JDawg · February 11, 2020, 8:37am

See my reply on Feb 11. There are things missing in the original post.

gary_lucas · February 24, 2020, 5:25am

I’ve been working through this guide and multiple others. I was really banging my head on getting the pulseaudio config set up correctly, I kept seeing: pulseaudio: Reason: Connection refused

after googling around for awhile I found a post suggesting that perhaps selinux was the culprit and it turned out to be the case. For the meantime I’m setting selinux to permissive until I figure out how to configure correctly for this.

On to the next problem!

chase9 · February 24, 2020, 3:29pm

I would recommend a tool called setroubleshoot. This’ll notify you about any SELinux issues and help you solve them.

lop3r · February 24, 2020, 8:27pm

just saw your post now. thanx man, i will try these things as soon as i have the time!

Odemia · March 2, 2020, 2:39am

Can confirm this is needed. Thanks for sharing JDawg and BansheeHero

JDawg:

There is nothing in here causing the script /usr/sbin/vfio-pci-override.sh to execute. To do that, you need to create a file like the following (you see it in the lsinitrd dump in the post):

Open an editor using: sudo vi /etc/modprobe.d/vfio.conf , and paste the following into the file:
install vfio-pci /usr/sbin/vfio-pci-override.sh; /sbin/modprobe --ignore-install vfio-pci

options vfio-pci disable_vga=1

Odemia · April 22, 2020, 5:37am

Big thanks to everyone on this thread. Was a huge help, finally got my dual boot Fedora31/Win10 setup converted over to VFIO and LookingGlass.

Just a couple outstanding items that would love to get fixed up, hoping someone has some ideas:

Audio: Has anyone gotten audio to pass from the VM to Pulse on the host? If so how?
- Have been reading and trying things from: PCI passthrough guide for Arch, LookingGlass wiki and these forums (mainly this threads section on QEMU Audio). But nothing has worked so far. Can’t get any audio from the win10 VM
- Was already considering getting a USB DAC, given how well the USB switches over I am tempted to get one and plug it into the USB card I was initially using for keyboard and mouse on the VM. Has anyone tried this?
Automating the creation of the shared memory file:
- Anyone found a way to reliably create the shm file?
- I was trying to use /etc/tmpfiles.d as described here to create the file at boot. It creates the file and it appears to have all the right permissions, but the LookingGlass client gives a permission error when launching. If I manually recreate the shm file as root, then set the ownership/perms to “user:kvm” and “660” then LookingGlass launches fine.

Appreciate any help or advice on these last couple issues.

SgtAwesomesauce · April 22, 2020, 6:04am

I could have sworn I had that in my guide. I’ll have to update it…

Okay, so you need ich6 or ich9. Last I checked, it’s not working properly on the latest windows 10 builds, due to driver issues. Causes major instability issues. So, stay on older builds (pre 1903) and don’t update unless you want to move away from it.

My recommendation is this:

Spring for the USB DAC. You can get a cheap one, but it’s a much better solution, all in.

Systemd unit file. Set it up to run on boot, as root, just have it execute a script in /usr/local/bin/ that sets it up.

Odemia · April 22, 2020, 6:32pm

Yeah, using ich9, that is what virt-manager setup. My win10 VM is 1909, didn’t know I needed to lock it to 1903 and doesn’t look like windows will let me roll that back. Might make a fresh VM just to try it out. Long-term solution will be USB DAC.

Systemd worked, here is what i did:

/etc/systemd/system/looking-glass.service

[Unit]
Description=Setup SHM for LookingGlass

[Service]
Type=oneshot
ExecStart=/usr/local/bin/looking-glass-shm.sh
ExecReload=/usr/local/bin/looking-glass-shm.sh
RemainAfterExit=true

/usr/local/bin/looking-glass-shm.sh

#/usr/bin/bash
shm_file=/dev/shm/looking-glass
user=USER
group=kvm
perms=660
touch $shm_file && chown $user:$group $shm_file && chmod $perms $shm_file

On CLI

chmod +x /usr/local/bin/looking-glass-shm.sh
chmod +x /etc/systemd/system/looking-glass.service
systemctl daemon-reload
systemctl start looking-glass.service

Thanks for the suggestions. To the internet to research USB DACs.

Odemia · April 22, 2020, 10:07pm

After rebooting I am getting the permission issue again. Now removing the shm file and rectreating with “systemctl restart looking-glass.service” is not working:

Traceback

Error starting domain: internal error: process exited while connecting to monitor: 2020-04-22T21:50:02.570798Z qemu-system-x86_64: -object memory-backend-file,id=shmmem-shmem0,mem-path=/dev/shm/looking-glass,size=33554432,share=yes: can't open backing store /dev/shm/looking-glass for guest RAM: Permission denied

Traceback (most recent call last):
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 75, in cb_wrapper
    callback(asyncjob, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/asyncjob.py", line 111, in tmpcb
    callback(*args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/libvirtobject.py", line 66, in newfn
    ret = fn(self, *args, **kwargs)
  File "/usr/share/virt-manager/virtManager/object/domain.py", line 1279, in startup
    self._backend.create()
  File "/usr/lib64/python3.7/site-packages/libvirt.py", line 1136, in create
    if ret == -1: raise libvirtError ('virDomainCreate() failed', dom=self)
libvirt.libvirtError: internal error: process exited while connecting to monitor: 2020-04-22T21:50:02.570798Z qemu-system-x86_64: -object memory-backend-file,id=shmmem-shmem0,mem-path=/dev/shm/looking-glass,size=33554432,share=yes: can't open backing store /dev/shm/looking-glass for guest RAM: Permission denied

If I delete the shm file and then run the same touch-chown-chmod command then the VM will launch. Whether it is manually created, systemctl or tmpfiles.d the permissions look the same:

-rw-rw----. 1 USER kvm 0 Apr 22 12:00 /dev/shm/looking-glass

Wondering if this is something with SELinux?

UPDATE: Yup was SELinux. Newer install and didn’t have SETroubleshoot installed so I wasn’t being notified . Resolved with:

ausearch -c ‘qemu-system-x86’ --raw | audit2allow -M my-qemusystemx86
semodule -X 300 -i my-qemusystemx86.pp
setsebool -P domain_can_mmap_files 1

localhost · June 21, 2020, 5:42pm

After adding these files and config I’m unable get to the Fedora login screen. Every time I boot the primary (hopeful guest) gpu input shows the initial load screen and boot loader, then after selecting the standard Fedora install my secondary (hopeful host) gpu input shows the loading screen that is typically shown before the Fedora login screen, but after it finishes loading, it just shows a black screen instead of the login screen. Any ideas?

Update: Fixed the issue. In my case I’m binding the graphics card in my primary PCIE slot to the vfio drivers, so I needed to create an xorg configuration file (/etc/X11/xorg.conf.d/secondary-gpu.conf) telling Fedora to use the secondary GPU for the OS.

Section "Device"
    Identifier "Device0"
    Driver "amdgpu"
    BusId "PCI:46:0:0"
EndSection

Worth noting here that even though my lspci -nnv output showed that the driver in use for my card was amdgpu, I actually needed a different equivalent driver for xorg for that card. You probably need a different xorg driver if you’re using nvidia, but here’s the command I used to get the necessary driver for my radeon card:

sudo dnf install xorg-x11-drv-amdgpu

Finally, it’s also worth noting that the BusId expects a number in decimal format, but the id shown in the lspci or ls-iommu outputs are in hex format. My cards was 00:2e:00.0, so the 2e became a 46 to get the BusID shown above.

Panda_Sing_Cool · June 23, 2020, 3:44am

Thanks for this detailed tuto. Trying to make it working on Fedora 32 but no luck. maybe a first question.

After installed a fresh F32, i have followed a tuto to install my Nvidia drivers.

+-----------------------------------------------------------------------------+

±----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 2030 G /usr/libexec/Xorg 35MiB |
| 0 2459 G /usr/libexec/Xorg 83MiB |
| 0 2591 G /usr/bin/gnome-shell 158MiB |
±----------------------------------------------------------------------------+

0c:00.0 VGA compatible controller [0300]: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti Rev. A] [10de:1e07] (rev a1) (prog-if 00 [VGA controller])
Subsystem: ZOTAC International (MCO) Ltd. Device [19da:1503]
Flags: bus master, fast devsel, latency 0, IRQ 132
Memory at f6000000 (32-bit, non-prefetchable) [size=16M]
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at f0000000 (64-bit, prefetchable) [size=32M]
I/O ports at e000 [size=128]
Expansion ROM at f7000000 [virtual] [disabled] [size=512K]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Legacy Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [250] Latency Tolerance Reporting
Capabilities: [258] L1 PM Substates
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Capabilities: [bb0] Resizable BAR <?>
Kernel driver in use: nvidia
Kernel modules: nouveau, nvidia_drm, nvidia

Working well.

Now following this current tuto, after rebuild the initramfs and checked with the lsinitrd , all required modules and grub conf seems ok.

after reboot , the kernel hang. Before going more in detail, quick question:
is it possible to run on a single host the nvidia driver for linux to keep GPU performance AND the passtrough option and get VM guest performance also with only 1 GPU (2080TI in this case) or must have 2 GPU card inside my rig 1 for my native linux and 1 for my vm ?

Hope i’m clear

Thanks

Odemia · July 5, 2020, 7:39pm

No. Host OS needs a GPU and will load drivers and take control of the IOMMU group. To pass a GPU to a guest OS you need to skip loading drivers for it in when the host OS boots so the guest OS can load drivers and take control of it when it boots.

There are other methods of accelerating graphics. WIth virgil3d/virtio you might have some luck, but it is unlikely to be anything like pass through performance.

anthr76 · May 11, 2021, 9:58pm

This has been incredibly useful! Thank you.