Return to Level1Techs.com

Fedora 33: Ultimiate VFIO Guide for 2020/2021 [WIP]

What’s the state of VFIO things in 2020?

I haven’t been this excited in ages let me tell you.

First, There is the PCIe Quirks Fix for Vega and Navi PCIe reset. Mind you AMD should have caught this in hardware development, but we have a working and reliable PCIe reset that works with surgical precision and no side-effects.

No More Patched Kernels and Hackery!

Second, big thank you to hard-working users here. A random selection – thank you to @BansheeHero , @SgtAwesomesauce , @gnif , @belfrypossum and many more. @belfrypossum and Gnif are heros right now because of the Reset fix For Navi/Vega and Looking Glass. More on that in a sec.

Check out these threads for historical context:

I have been writing guides for Fedora and Ubuntu for some years now – running a VFIO setup myself for… about 10 years now? Yeesh! Time flies.

What GPUs are best for this?

Well, with the new Navi/Vega reset via Quirks approach, AMD is a good choice. Nvidia gives you “Code 43” when you try to run Geforce cards in a virtual environment, but the work-aroundis trivial.

See also:
https://forum.level1techs.com/t/amd-polaris-vega-navi-reset-project-vendor-reset/163801/7

Getting Started

Fresh install of Fedora 33. First thing’s first.

Install cpufreq, because it’s nice:
https://extensions.gnome.org/extension/1082/cpufreq/

The modern fedora Gnome installer is good to go out of the box! Download the browser extension as prompted by the above website, and toggle it on. Next click on it, and it’ll ask you if you want to install some optional stuff. Read about that, and then I recommend you do that. Finally, make sure you have the Performance governor set (OnDemand is almost as good except for workloads that are quasi-busy – the cpu sleep/waking can give you less-than-100% performance) and make sure Turbo is toggled on. This is not an overclock! Just the normal boos behavior of the CPU. You for sure want that.

Our Hardware

Setups will differ for this guide and I will try keep a list here.
For now I am going to mention few notes:

  • Identical GPUs can work with this method. We override by PCIe address and not by Device ID as some other guides use.
  • Get at least 256GB SSD for Win 10, even if you have to use an iSCSI disk or NAS for additional space.
  • At least 16GB and 6 threads to even bother with the VM. The CPU penalty is
    still decently heavy for fast GPUs.
  • Make sure that all your displays can be run from Both of your GPUs, just in Case.
  • Plan ahead for the advantages of VM infrastrure.
    • NAS/Drive to handle snapshots and take them regularly.
    • Clone systems rather then go wide with a single installation. You can run one for each GPU, but it takes just a “reboot” of the guest.

Secure a “Plan B”

Since we’re going to be mucking about with the video drivers, I’d recommend making sure you can SSH into your machine from another machine on the network just in case things go wonky. You can use this access to fix an problems that might otherwise be a pain to fix.

1: Remote Access

Chances are SSH server is up and running for you already. Here is the basic setup.

sudo dnf install openssh-server
sudo firewall-cmd --add-service=ssh --permanent
sudo firewall-cmd --reload
sudo systemctl start sshd
sudo systemctl enable sshd

and make sure you can ssh in

ssh [email protected]

before going farther in the guide.

You’ll also want to make sure your IOMMU groups are appropriate using the ls-iommu.sh script which has been posted here and elsewhere:

#!/bin/bash
for d in /sys/kernel/iommu_groups/*/devices/*; do
  n=${d#*/iommu_groups/*}; n=${n%%/*}
  printf 'IOMMU Group %s ' "$n"
  lspci -nns "${d##*/}"
done

If you don’t have any IOMMU groups, make sure that you’ve enabled IOMMU (“Enabled” and not “Auto” is important on some motherboards) as well as SVM or VT-d.

We also need to enable IOMMU on the Linux side of things.

Checking for IOMMU enabled during system start:

dmesg | grep -i -e IOMMU | grep enabled

I find no difficulty setting this up in Fedora, the process is pretty automated. Let me know if you ran into issues.

sudo dnf reinstall kernel
grub2-editenv list

(I trust you know if you have an Intel or AMD system? :wink:

While we’re here, go ahead and install the @virtualization meta-package to get all the virtualization stuff we’ll need for this guide, if you don’t already have it.

Installing Packages

It is entirely possible to do almost all of this through the GUI. You need not be afraid of cracking open a terminal, though, and running commands. I’ll try to explain what’s going on with each command so you understand what is happening to your system.

First, we need virtualization packages and use the Fedora’s own maintainers. Personally I like to offload this work for smaller projects, but if any of you want to create a specific list of packages, I will add the list here.

# sudo dnf install @virtualization

User settings are not adjusted, if you user is not in wheel it will require further groups to operate KVM and other aspects of this project.

Reboot after installing is usually recommended to force you to log out and in again, basically, to ensure these changes are realized. But it is not time for that; our work is not yet done.

Configure Grub on Fedora for VFIO.

We need to add two boot-time parameters – one to enable IOMMU and one to tell the kernel to pre-load the vfio kernel module (some users reported this fixed cases where the Nvidia proprietary driver grabbed the device really early in the boot process!)

Adding the intel_iommu=on or amd_iommu=on into GRUB_CMDLINE_LINUX

sudo vim /etc/sysconfig/grub

Add the option here. Mine looks like

GRUB_CMDLINE_LINUX="rhgb quiet amd_iommu=on rd.driver.pre=vfio-pci "

… because I have a Threadripper system.

Before we rebuild the initial ramdisk, we have yet more work to do.

We are going to create a custom dracut module. This will be responsible for binding our GPU (any any other PCIe devices we want to pass through) early in the boot process.

The Initial Ramdisk and You

I know what some of these words mean? Yeah, it’ll be fine. So as part of the boot process drivers are needed for your hardware. They come from the initial ram disk, along with some configuration.

Normally, you do configuration of the VFIO modules to tag the hardware you want to pass through. At boot time the VFIO drivers bind to that hardware and prevent the ‘normal’ drivers from loading. This is normally done by PCIe vendor and device ID, but doesn’t work for this System 76 system because it’s got two identical GPUs.

It’s really not a big deal, though, we just need to handle the situation differently.

Early in the bootprocess we’ll bind vfio to one of the GPUs (and the audio device, and optionally, a USB device) via a shell script. Nvidia RTX (2000 series) cards also have USB/serial devices, like the new RTX6000 series cards from AMD, which will need to be bound, since they are in the same IOMMU group.

This script will have to be modified to suit your system. You can run

 # lspci -vnn 

to find the PCIe device(s) associated with your cards. Normally there is a “VGA compatible controller” and an audio controller, but with RTX cards and AMD 6000 cards, there are up to 4 devices typically:

My setup:

03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73bf (rev c1)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device ab28
03:00.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73a6
03:00.3 Serial bus controller [0c80]: Advanced Micro Devices, Inc. [AMD/ATI] Device 73a4
21:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7)
21:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]

Note that my devices showed up at 0000:03.0 - 0000:03.4 and 0000:03.0 was the primary card (meaning we want to pass through the 4 devices under 0000:03)

We will want to be sure that we bind vfio to all of these – they are likely to be grouped in the same IOMMU group anyway (forcing all the devices to be bound to VFIO drivers for passthrough).

The script will help us make sure the vfio driver has been properly attached to the device in question.

This is the real “special sauce” for when you have two like GPUs and only want to use one for VFIO. It’s even easier if your GPUs are unalike, but this method works fine for either scenario.

Modify DEVS line in the script (prefix with the 0000 or check out /sys/bus/pci/devices to confirm if you like) and then save it to /usr/sbin/vfio-pci-override.sh

#!/bin/sh
PREREQS=""
DEVS="0000:03:00.0 0000:03:00.1 0000:03:00.2 0000:03:00.3"

for DEV in $DEVS; do
        echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
done

modprobe -i vfio-pci

Note: Xeon, Threadripper or multi-socket systems may very well have a PCIe device prefix of 0001 or 000a… so double check at /sys/bus/pci/devices if you want to be absolutely sure.

With the script created, you need to make it executable and add it to the initial ram disk so that it can do its work before any other driver is loaded. With the nvidia driver especially – if you’re following this guide with nvidia 2000 or 3000 series GPUs – it comes lumbering through to claim everything it can. (It’s basically the spanish inquisition as far as device drivers go).

Since Fedora is using dracut to manage initramfs, we have to create a custom dracut module, ideally.

Steps:

mkdir /usr/lib/dracut/modules.d/20vfio
# Note "20" here helps things run in the right order. If you ls /usr/lib/dracut/modules you get the idea

Create /usr/lib/dracut/modules.d/module-setup.sh with the following contents:

#!/usr/bin/bash
check() {
  return 0
}
depends() {
  return 0
}
install() {
  declare moddir=${moddir}
  inst_hook pre-udev 00 "$moddir/vfio-pci-override.sh"
}

Create a symbolic link in your custom vfio folder:

ln -s /usr/sbin/vfio-pci-override.sh /usr/lib/dracut/modules.d/30vfio/vfio-pci-override.sh

Configure dracut in /etc to look for this new module by name.

Create /etc/dracut.conf.d/vfio.conf with this contents:

dd_dracutmodules+=" vfio "
force_drivers+=" vfio vfio-pci vfio_iommu_type1 "
install_items="/usr/sbin/vfio-pci-override.sh /usr/bin/find /usr/bin/dirname"

TODO: I don’t think the install items is needed anymore. The symlink gets included automagically. I hope. Or else it’s a dangling symlink…

Finally, salvation.

Err, finally, time to run dracut -fv you should get some successful output. If you get a complaint that the vfio module is missing, check you got the filenames and paths exactly right. If you changed the 30 run piority, make sure you changed it consisently everywhere.

Finally, sanity check for reboot because I hate rebooting

*Note: If you want o learn more about dracut custom modules, the man pages are actually pretty good. Whoever did those, I appreciate you <3 *

sudo lsinitrd | grep vfio 

etc/modprobe.d/vfio.conf
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio/pci
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio/pci/vfio-pci.ko.xz
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio/vfio_iommu_type1.ko.xz
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio/vfio.ko.xz
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio/vfio_virqfd.ko.xz
usr/sbin/vfio-pci-override.sh

Comments from Wendell:

I always like to verify the initial ramdisk does actually contain everything we need. This might be an un-needed step, but on my system I ran:

This is an end of a first chapter, after rebooting PC should be ready for VFIO and your GPU free for VM use.

Reboot at this point, and use

# lspci -nnv 

to verify that the vfio-pci driver has been loaded:


TODO

… The earlier Fedora 31/32 guide from here are prettymuch the same, but I wanted to document the proper dracut custom module procedure for posterity.

25 Likes

Thanks, but I think you meant @belfrypossum :slight_smile:

2 Likes

Since virt-manager is deprecated, any tips on how to use a different GUI based VM manager?

Skylake-X derivatives also have weird PCI IDs starting in 64 or 65.

Who said virt-manager is deprecated?

Red Hat:

https://www.phoronix.com/scan.php?page=news_item&px=RHEL-Virt-Manager-Deprecation

Hrmm, interesting, I wonder what the motivation behind that decision was.

Pushing Cockpit, a heavy webapp based VM manager.

1 Like

Sigh… what a sad thing to hear and such a huge step backwards.

1 Like

For now virtmanager still works fine and sounds like a problem for fedora 34 guide-writer wendell

4 Likes

They want to standardise their administration UI to one system (i.e. cockpit) for RHEL, presumably so in the long term for future releases they can have one go to place for UI based administration rather than a myriad of separate tools.

virt-manager its self isnt depreciated, its only depreciated from RHEL builds from 8 onwards.

Still available in Fedora 34

3 Likes

It should be noted that virt-manager isn’t going anywhere. It’s still actively developed and available to use. All that’s ending is red hat’s official support and integration. There has been much confusion about this.

https://blog.wikichoon.com/2020/06/virt-manager-deprecated-in-rhel.html

4 Likes

Why are you such a hater :slight_smile: J/K

I still use virt-manager and virsh. There won’t be a GTK/Qt alternative :slight_smile:

Thanks @wendell I will update my original thread and my next system is going to be Silverblue. I like the concept enough for newbies, that I will endure that extra challenge :slight_smile:

Note that this probably just means that virt-manager will move to Fedora EPEL for RHEL/CentOS 9. If anything, this might be a good thing as it can be more aggressively updated.

1 Like

Thanks for this update. I’ll be moving to VFIO as soon as I can source a 5000 cpu and a 6000 gpu. Can you put a link to any recommended mother boards? I know for AMD I’ll want to go with an x570, but I don’t know if any of them have better IOMMU groups or whatever.

I think this more true that you meant. Later you must have update grub.cfg either manually or unintentionally.

Notes:

Use any of these if you want.

Checking if you have booted with correct options:

cat /proc/cmdline

My output:

BOOT_IMAGE=(hd0,gpt2)/vmlinuz-5.9.11-200.fc33.x86_64 root=UUID=aa16dfce-be02-4f48-a32d-ee26ae99a48d ro rd.driver.blacklist=nouveau modprobe.blacklist=nouveau resume=UUID=ee7a7237-a530-4c37-8c62-79b33ddef287 rhgb intel_iommu=on vfio-pci.ids=8086:a2af,10de:1b80,10de:10f0

Making sure your config is correct before you reboot

sudo grubby --info=DEFAULT | egrep --color '^|intel_iommu=on|amd_iommu=on|rd.driver.pre=vfio-pc'

My output:

index=0
kernel="/boot/vmlinuz-5.9.11-200.fc33.x86_64"
args=“ro rd.driver.blacklist=nouveau modprobe.blacklist=nouveau resume=UUID=ee7a7237-a530-4c37-8c62-79b33ddef287 rhgb intel_iommu=on vfio-pci.ids=8086:a2af,10de:1b80,10de:10f0 isolcpus=1,2,3,5,6,7”
root=“UUID=aa16dfce-be02-4f48-a32d-ee26ae99a48d”
initrd="/boot/initramfs-5.9.11-200.fc33.x86_64.img"
title=“Fedora (5.9.11-200.fc33.x86_64) 33 (Workstation Edition)”
id=“ece237a53a024f1da19a8444d8979947-5.9.11-200.fc33.x86_64”

As you can see:

  • I have intel_iommu=on because I am on Intel
  • I am missing forcing the preload of VFIO driver rd.driver.pre=vfio-pc

Adding arguments the old fashion way

Adjust the line in /etc/default/grub and save it.
This will not update the config, it needs to be applied.

sudo [ -e "/boot/grub2/grub.cfg" ] && sudo grub2-mkconfig -o /boot/grub2/grub.cfg;
sudo [ -e "/boot/efi/EFI/fedora/grub.cfg" ] && sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg

This will generate config manually, you can already guess this is clumsy. You could use find instead.

My suggestion - use grubby

Grubby allows to check and update individual kernel options as well as update default ones.

Setting up IOMMU kernel parameters

Because I am on AMD, run the following:

sudo grubby --update-kernel=ALL --args="amd_iommu=on rd.driver.pre=vfio-pc"

In order to confirm we are on the right track:

sudo grubby --info=DEFAULT | egrep --color '^|intel_iommu=on|amd_iommu=on|rd.driver.pre=vfio-pc'

You will see a line with all kernel parameters and your modules should be highlighted:


args=" … rhgb amd_iommu=on rd.driver.pre=vfio-pc"

Your /etc/default/grub has been updated automatically.

If you have made a mistake then you can run the following: (In this example adding only pre=vfio-pc instead of the full rd.driver.pre=vfio-pc)

sudo grubby --update-kernel=ALL --remove-args="pre=vfio-pc"
3 Likes

Upgraded my setup to Fedora 33, no problems with VFIO, small problem with package conflict and resolved inside of DNF with --best --allowerase

That said I also have some performance numbers for my 7700K as this CPU is just barely good enough for VR. Regular games do not need to meet 11ms to render a frame for a smooth experience. Most games would just run slow.

Default setup with Looking Glass, Spice Channel and VirtIO serial scored below 4000 in VRMark Demo.
This is unplayable in demanding games and games running through SteamVR in Oculus headset.

Turning off VirtIO and Spice channels net me 700 points and made the experience change from unusable to bad.

Further closing all sessions with LG I achieved over 5000 points and reaching a level of bad gaming laptop :slight_smile:

To give you perspective, bare-metal score would be between 11000 and 14000.

Now I used grubby to isolate cores in Linux Host, leaving only first core to the Host.

sudo grubby --update-kernel=ALL --args="isolcpus=1,2,3,5,6,7"

Note that my XML was pinned the whole time like this:

    <vcpupin vcpu='0' cpuset='1'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='6'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='7'/>
    <emulatorpin cpuset='0'/>
  </cputune>

After this I was able to reach scores up to 9000 making the experience fine.
So far this is evidence of following:

  • Always check you VFIO and Overclocking against previous runs. Do not expect linear gains or losses.
  • Linux scheduler + QEMU are not fast enough to adjust in time for VR:
    • Linux scheduler is not fast enough to free cores for VR workload.
      Takes minutes in the game for performance to improve.
    • Linux scheduler is fast enough to notice a dip in VR workload and move the host tasks back to the cores used by VR.
      Even when performance stabilizes, you will eventually get the stutter again and it will take minutes to resolve again.

Hope this helps someone when diagnosing CPU limitations and losses in VFIO.

2 Likes

Interesing results on the scheduler slowness, thanks for sharing! One thing I noticed is that isolcpus is now depreciated in what looks like the 5.x series kernels.

isolcpus=       [KNL,SMP,ISOL] Isolate a given set of CPUs from disturbance.
                        [Deprecated - use cpusets instead]
                        Format: [flag-list,]<cpu-list>

So this solution may be yanked out from under you eventually.

Cset shield seems like a possible solution: https://www.codeblueprint.co.uk/2019/10/08/isolcpus-is-deprecated-kinda.html

And this is an example of how to use taskset in proxmox (which is what I use for a host): https://forum.proxmox.com/threads/cpu-pinning.67805/#post-304715

While I’m here, anyone have a good way of benchmarking this GPU/latency stuff in a linux guest?

I think it should not be necessary going forward. Besides benchmarks, where I expect to find results - most games do not really care. (VR games are scaled down to hit 90Hz VSync)
The only game that is unplayable atm is Half-Life Alyx. The game behaves like a power virus and loops on itself leading eventually to a crash.

I believe most synthetic benchmarks work fine in Linux through support layers like Wine.
Not sure about third-party measuring tools like RivaTuner.
The value you are looking for is frame-time. Frame target is 8ms and hard-limit is 11.11. Runtime can smooth out even 13ms

Maybe easiest way to test my problem on Linux - just run a power virus cycle in Guest and see how quickly other tasks move away from this thread. (Prime, Render …)

The behavior I expect is that if 6 threads are pinned down to 100%, tasks are moved to the other 2 and are spawned there. (In general). Maybe it is just my ignorance :slight_smile:

1 Like

Now the legend continues, well kind off fizzles. I am done with that game, maybe as a test tool :smiley:

I got my score over 10000 compared to 12000 on baremetal. I have beaten the score on stock CPU multiplier. All the other games in my library work flawlessly with VFIO.

Switched to barebone - Alyx runs fine and I get down to high settings and 5-6ms average.

Before I finished with testing I decided to play a little. After getting stuck inside objects 17 times and having to run noclip just to get out - I am done. It was more fun trying to get VFIO to achieve better results than playing the game. Now playing Lone Echo and Subnautica without leaving my trusty workstation setup.

1 Like

Here you say to do 20 for the folder for priority but then the rest is 30 if I am reading it right? I am probably going to set redo the folder as 30 but wanted to mention it in case it is a problem.