What’s the state of VFIO things in 2020?
I haven’t been this excited in ages let me tell you.
First, There is the PCIe Quirks Fix for Vega and Navi PCIe reset. Mind you AMD should have caught this in hardware development, but we have a working and reliable PCIe reset that works with surgical precision and no side-effects.
No More Patched Kernels and Hackery!
Second, big thank you to hard-working users here. A random selection – thank you to @BansheeHero , @SgtAwesomesauce , @gnif , @belfrypossum and many more. @belfrypossum and Gnif are heros right now because of the Reset fix For Navi/Vega and Looking Glass. More on that in a sec.
Check out these threads for historical context:
I have been writing guides for Fedora and Ubuntu for some years now – running a VFIO setup myself for… about 10 years now? Yeesh! Time flies.
What GPUs are best for this?
Well, with the new Navi/Vega reset via Quirks approach, AMD is a good choice. Nvidia gives you “Code 43” when you try to run Geforce cards in a virtual environment, but the work-aroundis trivial.
See also:
https://forum.level1techs.com/t/amd-polaris-vega-navi-reset-project-vendor-reset/163801/7
Getting Started
Fresh install of Fedora 33. First thing’s first.
Install cpufreq, because it’s nice:
https://extensions.gnome.org/extension/1082/cpufreq/
The modern fedora Gnome installer is good to go out of the box! Download the browser extension as prompted by the above website, and toggle it on. Next click on it, and it’ll ask you if you want to install some optional stuff. Read about that, and then I recommend you do that. Finally, make sure you have the Performance governor set (OnDemand is almost as good except for workloads that are quasi-busy – the cpu sleep/waking can give you less-than-100% performance) and make sure Turbo is toggled on. This is not an overclock! Just the normal boos behavior of the CPU. You for sure want that.
Our Hardware
Setups will differ for this guide and I will try keep a list here.
For now I am going to mention few notes:
- Identical GPUs can work with this method. We override by PCIe address and not by Device ID as some other guides use.
- Get at least 256GB SSD for Win 10, even if you have to use an iSCSI disk or NAS for additional space.
- At least 16GB and 6 threads to even bother with the VM. The CPU penalty is
still decently heavy for fast GPUs. - Make sure that all your displays can be run from Both of your GPUs, just in Case.
- Plan ahead for the advantages of VM infrastrure.
- NAS/Drive to handle snapshots and take them regularly.
- Clone systems rather then go wide with a single installation. You can run one for each GPU, but it takes just a “reboot” of the guest.
Secure a “Plan B”
Since we’re going to be mucking about with the video drivers, I’d recommend making sure you can SSH into your machine from another machine on the network just in case things go wonky. You can use this access to fix an problems that might otherwise be a pain to fix.
1: Remote Access
Chances are SSH server is up and running for you already. Here is the basic setup.
sudo dnf install openssh-server sudo firewall-cmd --add-service=ssh --permanent sudo firewall-cmd --reload sudo systemctl start sshd sudo systemctl enable sshd
and make sure you can ssh in
ssh [email protected]
before going farther in the guide.
You’ll also want to make sure your IOMMU groups are appropriate using the ls-iommu.sh script which has been posted here and elsewhere:
#!/bin/bash
for d in /sys/kernel/iommu_groups/*/devices/*; do
n=${d#*/iommu_groups/*}; n=${n%%/*}
printf 'IOMMU Group %s ' "$n"
lspci -nns "${d##*/}"
done
If you don’t have any IOMMU groups, make sure that you’ve enabled IOMMU (“Enabled” and not “Auto” is important on some motherboards) as well as SVM or VT-d.
We also need to enable IOMMU on the Linux side of things.
Checking for IOMMU enabled during system start:
dmesg | grep -i -e IOMMU | grep enabled
I find no difficulty setting this up in Fedora, the process is pretty automated. Let me know if you ran into issues.
sudo dnf reinstall kernel
grub2-editenv list
(I trust you know if you have an Intel or AMD system?
While we’re here, go ahead and install the @virtualization meta-package to get all the virtualization stuff we’ll need for this guide, if you don’t already have it.
Installing Packages
It is entirely possible to do almost all of this through the GUI. You need not be afraid of cracking open a terminal, though, and running commands. I’ll try to explain what’s going on with each command so you understand what is happening to your system.
First, we need virtualization packages and use the Fedora’s own maintainers. Personally I like to offload this work for smaller projects, but if any of you want to create a specific list of packages, I will add the list here.
# sudo dnf install @virtualization
User settings are not adjusted, if you user is not in wheel it will require further groups to operate KVM and other aspects of this project.
Reboot after installing is usually recommended to force you to log out and in again, basically, to ensure these changes are realized. But it is not time for that; our work is not yet done.
Configure Grub on Fedora for VFIO.
We need to add two boot-time parameters – one to enable IOMMU and one to tell the kernel to pre-load the vfio kernel module (some users reported this fixed cases where the Nvidia proprietary driver grabbed the device really early in the boot process!)
Adding the
intel_iommu=on
oramd_iommu=on
intoGRUB_CMDLINE_LINUX
sudo vim /etc/sysconfig/grub
Add the option here. Mine looks like
GRUB_CMDLINE_LINUX="rhgb quiet amd_iommu=on rd.driver.pre=vfio-pci "
… because I have a Threadripper system.
Before we rebuild the initial ramdisk, we have yet more work to do.
We are going to create a custom dracut module. This will be responsible for binding our GPU (any any other PCIe devices we want to pass through) early in the boot process.
The Initial Ramdisk and You
I know what some of these words mean? Yeah, it’ll be fine. So as part of the boot process drivers are needed for your hardware. They come from the initial ram disk, along with some configuration.
Normally, you do configuration of the VFIO modules to tag the hardware you want to pass through. At boot time the VFIO drivers bind to that hardware and prevent the ‘normal’ drivers from loading. This is normally done by PCIe vendor and device ID, but doesn’t work for this System 76 system because it’s got two identical GPUs.
It’s really not a big deal, though, we just need to handle the situation differently.
Early in the bootprocess we’ll bind vfio to one of the GPUs (and the audio device, and optionally, a USB device) via a shell script. Nvidia RTX (2000 series) cards also have USB/serial devices, like the new RTX6000 series cards from AMD, which will need to be bound, since they are in the same IOMMU group.
This script will have to be modified to suit your system. You can run
# lspci -vnn
to find the PCIe device(s) associated with your cards. Normally there is a “VGA compatible controller” and an audio controller, but with RTX cards and AMD 6000 cards, there are up to 4 devices typically:
My setup:
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73bf (rev c1)
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device ab28
03:00.2 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] Device 73a6
03:00.3 Serial bus controller [0c80]: Advanced Micro Devices, Inc. [AMD/ATI] Device 73a4
21:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7)
21:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]
Note that my devices showed up at 0000:03.0 - 0000:03.4 and 0000:03.0 was the primary card (meaning we want to pass through the 4 devices under 0000:03)
We will want to be sure that we bind vfio to all of these – they are likely to be grouped in the same IOMMU group anyway (forcing all the devices to be bound to VFIO drivers for passthrough).
The script will help us make sure the vfio driver has been properly attached to the device in question.
This is the real “special sauce” for when you have two like GPUs and only want to use one for VFIO. It’s even easier if your GPUs are unalike, but this method works fine for either scenario.
Modify DEVS line in the script (prefix with the 0000 or check out /sys/bus/pci/devices to confirm if you like) and then save it to /usr/sbin/vfio-pci-override.sh
#!/bin/sh
PREREQS=""
DEVS="0000:03:00.0 0000:03:00.1 0000:03:00.2 0000:03:00.3"
for DEV in $DEVS; do
echo "vfio-pci" > /sys/bus/pci/devices/$DEV/driver_override
done
modprobe -i vfio-pci
Note: Xeon, Threadripper or multi-socket systems may very well have a PCIe device prefix of 0001 or 000a… so double check at /sys/bus/pci/devices if you want to be absolutely sure.
With the script created, you need to make it executable and add it to the initial ram disk so that it can do its work before any other driver is loaded. With the nvidia driver especially – if you’re following this guide with nvidia 2000 or 3000 series GPUs – it comes lumbering through to claim everything it can. (It’s basically the spanish inquisition as far as device drivers go).
Since Fedora is using dracut to manage initramfs, we have to create a custom dracut module, ideally.
Steps:
mkdir /usr/lib/dracut/modules.d/20vfio
# Note "20" here helps things run in the right order. If you ls /usr/lib/dracut/modules you get the idea
Create /usr/lib/dracut/modules.d/module-setup.sh
with the following contents:
#!/usr/bin/bash
check() {
return 0
}
depends() {
return 0
}
install() {
declare moddir=${moddir}
inst_hook pre-udev 00 "$moddir/vfio-pci-override.sh"
}
Create a symbolic link in your custom vfio folder:
ln -s /usr/sbin/vfio-pci-override.sh /usr/lib/dracut/modules.d/30vfio/vfio-pci-override.sh
Configure dracut in /etc to look for this new module by name.
Create /etc/dracut.conf.d/vfio.conf
with this contents:
dd_dracutmodules+=" vfio "
force_drivers+=" vfio vfio-pci vfio_iommu_type1 "
install_items="/usr/sbin/vfio-pci-override.sh /usr/bin/find /usr/bin/dirname"
TODO: I don’t think the install items is needed anymore. The symlink gets included automagically. I hope. Or else it’s a dangling symlink…
Finally, salvation.
Err, finally, time to run dracut -fv
you should get some successful output. If you get a complaint that the vfio module is missing, check you got the filenames and paths exactly right. If you changed the 30
run piority, make sure you changed it consisently everywhere.
Finally, sanity check for reboot because I hate rebooting
*Note: If you want o learn more about dracut custom modules, the man pages are actually pretty good. Whoever did those, I appreciate you <3 *
sudo lsinitrd | grep vfio
etc/modprobe.d/vfio.conf
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio/pci
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio/pci/vfio-pci.ko.xz
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio/vfio_iommu_type1.ko.xz
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio/vfio.ko.xz
usr/lib/modules/5.2.9-200.fc30.x86_64/kernel/drivers/vfio/vfio_virqfd.ko.xz
usr/sbin/vfio-pci-override.sh
Comments from Wendell:
I always like to verify the initial ramdisk does actually contain everything we need. This might be an un-needed step, but on my system I ran:
This is an end of a first chapter, after rebooting PC should be ready for VFIO and your GPU free for VM use.
Reboot at this point, and use
# lspci -nnv
to verify that the vfio-pci driver has been loaded:
TODO
… The earlier Fedora 31/32 guide from here are prettymuch the same, but I wanted to document the proper dracut custom module procedure for posterity.