Ubuntu 17.04 -- VFIO PCIe Passthrough & Kernel Update (4.14-rc1)

Quick and may be stupid question, but where does virt-manager store the XML or shell script containing all the configuration details for the VM? I figured it out with the driver image and have now moved that to my prefered location, but I can’t find the configuration files anywhere.
In Ubuntu 16.04 and the version of qemu that came with that, I could write the scripts myself and store them where I want and run them manually without virt-manager. With the newer version of qemu I have struggled to get my old scripts to work so I have moved back to virt-manager. However virt-manager seems to be a blackbox re the VM config details.

Thanks

See the libvirt FAQ-
https://wiki.libvirt.org/page/FAQ#Where_are_VM_config_files_stored.3F_How_do_I_edit_a_VM.27s_XML_config.3F

You are not supposed to directly edit the XML in the directory tree. Use virsh edit <vm-name> to edit the VM config XML. If you want to back it up, then use virsh dumpxml <vm-name> and restore a config with virsh define <path to xml>

Thanks for the commands @TheCakeIsNaOH, I’ll have a look through the FAQ as well.
I am still a complete newbie to all this GPU passthrough stuff even though I’ve been using it for a couple of years now.

Btw, virsh edit command have not worked for me on ubuntu-16.04 - it does not save any of my edits. I had to switch to ubuntu-18.04 to get virsh edit working.

where’s you post for ElementaryOS 5?
I use elementary OS as well. I was actually thinking of putting something together too

I wrote and ran scripts manually with information from multiple sites when was on ElementrayOS Freya/Unbuntu 16.04, I could never get virt-manager to work properly (though I may not have known enough about how it worked back then) as it did when I was using ApracityOS/Arch before updates would break things; the main reason (stability) I went back to ElementaryOS/Ubuntu.
Anyway, things were surprisingly easier to do in ElementaryOS 5/Ubuntu 18.04. Though there are a couple of quirks

Not sure if anyone else has had an issue with the CPU allocation to the VM but just in case, the CPU Topology may need to be set manually.
I was used to having limited number of cores on my other Windows (or should I say Star Citizen and Tomb Raider really) gaming VM with an i7-4790K+Vega64 that it was easy to just let KVM do its own thing re the CPU. I am guessing that with only 4 cores and 8 logical host CPUs @ 4GHz on the i7-4790K the most I can pass to the VM is 6 logical host CPUs or 3-core and 2 threads so there may be no noticeable advantage other default.
Now I’m running a new workstation with an Intel - Xeon E5-2683 V3 2 GHz 14-Core CPU + GTX980ti, I found that when I passed the VM 12 of the 28 threads expecting to see 12CPUs on the VM, it only registered 2 CPUs at 2GHz each and they’d almost always be near 100% usage.
I then noticed that even though I had the “Copy host CPU configuration” option checked in Virt-Manager, this would always change to Haswell-noTSX once the VM started running.
It turns out that with my setup I have to manually set the CPU topology to get the Xeon CPU to register properly. The default topology was passing all the CPU allocation to sockets rather than cores and threads.
Once I manually set the topology to
Sockets = 1
Core = 6
threads = 2
The VM started behaving like it should and is much smoother as a result.

Will do my best to help. There are so many guides out there with partially complete information that it is difficult to know what to do when you run into snags or things don’t seem to work like you expect.
I just found out about the GPU BIOS thing from a completely different guide on the Vega reset bug which seems to be hit and miss and bit of a hardware lottery.

Does anyone know of a way to do this with identical cards? My two 580 show up in separate IOMMU groups (14 and 15) but have the same ID:

IOMMU Group 14 08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] [1002:67df] (rev e7)
IOMMU Group 14 08:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] [1002:aaf0]
IOMMU Group 15 09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] [1002:67df] (rev e7)
IOMMU Group 15 09:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] [1002:aaf0]

I’m assuming that if I tell it to bind the device ID for vfio, it’ll bind both and I will have no display card?

EDIT: I found this script

Not sure if it’ll work though, or if it can be integrated with this method.

The concept should work, but that example may need a bit of tweaking for a current distro.

Here are a couple more examples-
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Using_identical_guest_and_host_GPUs

2 Likes

OK, this may be a wait for the weekend project. Thanks for the link.

Taking from

Arch PCI passthrough via OVMF

Using identical guest and host GPUs

Reason: A number of users have been having issues with this, it should probably be adressed by the article.

Due to how vfio-pci uses your vendor and device id pair to identify which device they need to bind to at boot, if you have two GPUs sharing such an ID pair you will not be able to get your passthough driver to bind with just one of them. This sort of setup makes it necessary to use a script, so that whichever driver you are using is instead assigned by pci bus address using the driver_override mechanism.

Script variants
Passthrough all GPUs but the boot GPU
Here, we will make a script to bind vfio-pci to all GPUs but the boot gpu. Create the script /usr/bin/vfio-pci-override.sh:

#!/bin/sh

for i in /sys/bus/pci/devices/*/boot_vga; do
	if [ $(cat "$i") -eq 0 ]; then
		GPU="${i%/boot_vga}"
		AUDIO="$(echo "$GPU" | sed -e "s/0$/1/")"
		echo "vfio-pci" > "$GPU/driver_override"
		if [ -d "$AUDIO" ]; then
			echo "vfio-pci" > "$AUDIO/driver_override"
		fi
	fi
done 

modprobe -i vfio-pci

Passthrough selected GPU
In this case we manually specify the GPU to bind.

#!/bin/sh

GROUP="0000:00:03.0"
DEVS="0000:03:00.0 0000:03:00.1 ."

if [ ! -z "$(ls -A /sys/class/iommu)" ]; then
	for DEV in $DEVS; do
		echo "vfio-pci" > /sys/bus/pci/devices/$GROUP/$DEV/driver_override
	done
fi

modprobe -i vfio-pci

Oops I duplicated your reply

Should I avoid modifying the initramfs-tools/modules file, if I go this route?

EDIT: So when I run

sudo modprobe -i vfio-pci

I get:

libkmod: ERROR …/libkmod/libkmod-config.c:656 kmod_config_parse: /etc/modprobe.d/vfio_pci.conf line 1: ignoring bad line starting with ‘options’

I’m assuming something is wrong with my /etc/modprobe.d/vfio_pci.conf file. Was there supposed to be a preexisting configuration there? Mine was blank so I just pasted what Wendell had minus the ids part (so the contents of the file are just “options vfio_pci”)

2nd EDIT:

So I got that error message to go away by changing the file to read:

install vfio-pci /usr/bin/vfio-pci-override.sh

but it’s still not binding the graphics card. Both are saying amdgpu as the kernel driver and kernel modules for both graphics cards.

I have grub set up to use iommu:

GRUB_CMDLINE_LINUX_DEFAULT=“quiet splash iommu=1 amd_iommu=on”

Here’s the contents of my /etc/initramfs-tools/modules file:

softdep amdgpu pre: vfio vfio_pci
vfio
vfio_iommu_type1
vfio_virqfd
vfio_pci
amdgpu

here’s my /etc/modules file:

8812au
8812au
vfio
vfio_iommu_type1
vfio_pci

Could it possibly be that I need to include/call the script in initramfs? I’m not sure how to go about doing that.

Sorry, just saw your replay and realized you are using 2 AMD GPUs for some reason I was thinking 2 nvidia cards even after reading RX580. I think part of the issue may be the loading time of the AMDGPU/radeon driver. When I was initially passing through an R9 390 I had to literally blacklist the radeon and AMDGPU drivers for it to work.

That said your /etc/initramfs-tools/modules file should also have the lines

options vfio_pci ids=10de:1b80,10de:10f0
vfio_pci ids=10de:1b80,10de:10f0

in it where the ids are those of you passthrough GPU. You may want to split the options vfio_pci into 2 lines, one for the video and one for the audio parts of the GPU. It may be overkill, but literally have it 3 times in my file

options vfio_pci ids=10de:1b80
options vfio_pci ids=10de:10f0
options vfio_pci ids=10de:1b80,10de:10f0

Add these lines before the last 2 lines of your current file.

Your /etc/modules file should also have in it the line

vfio_pci ids=10de:1b80,10de:10f0

You almost need for force vfio to take hold of the GPU

If you are on a Kernel from 4.18.16 forward you may also have to try this: I am copying and pasting from the Arch Wiki

Create /etc/modprobe.d/vfio.conf with the following:

install vfio-pci /usr/bin/vfio-pci-override.sh

Edit /etc/mkinitcpio.conf

Remove any video drivers from MODULES, and add vfio-pci, and vfio_iommu_type1

MODULES=(ext4 vfat vfio-pci vfio_iommu_type1)

Add/etc/modprobe.d/vfio.conf and /usr/bin/vfio-pci-override.sh to FILES:

FILES=(/etc/modprobe.d/vfio.conf /usr/bin/vfio-pci-override.sh)

regenerate initramfs and reboot

update-initramfs -u
reboot

I have avoided identical GPUs just because it seemed a complicated process and have not been confident in using vfio but forcing myself to learn what I can right now. Hopeful these work for you.

I’ve actually hoped that i could do this trick on my laptop. It’s a Dell 9570, so i have both a dedicated and a integrated GPU. My daily drivers is the integrated Intel, so my wish was to use the 1050 Ti for Looking Glass.

After multiple hours in Linux Mint, i’m almost giving up. I tried to type down every step, so that i can post it here, if i make it till the very end, but at the moment, i find it very hard.

Here is a few details about what is under control:

  • Virt-manager is installed, including a lot of different packages

  • iOOMU is under control (guess so?) and 10de:1c8c is now shared:

softdep nvidia pre: vfio vfio_pci

vfio
vfio_iommu_type1
vfio_virqfd
options vfio_pci ids=10de:1c8c
vfio_pci ids=10de:1c8c
vfio_pci
nvidia

It’s also listed correctly with lspci -nnv |less:

01:00.0 3D controller [0302]: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile][10de:1c8c] (rev a1)
        Subsystem: Dell GP107M [GeForce GTX 1050 Ti Mobile][1028:087c]
        Flags: bus master, fast devsel, latency 0, IRQ 11
        Memory at ec000000 (32-bit, non-prefetchable) [size=16M]
        Memory at c0000000 (64-bit, prefetchable) [size=256M]
        Memory at d0000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 3000 [disabled][size=128]
        Expansion ROM at ed000000 [disabled][size=512K]
        Capabilities: <access denied>
        Kernel driver in use: vfio-pci
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
  • Windows 10 is installed in a VM, and the gfx PCI is added. I’ve keeped the default video output to debug and install windows the first time.

  • <kvm><hidden state='on'/></kvm> is added after </hyperv> (with virsh edit <nameofvm>), and a few other things, mentioned here

  • In Device Manager i’m able to find a few devices

Currenly my main problem/question is:

  1. Nvidia Driver can’t be installed. Is failing. I saw a post here about patching the drivers. But is that actually my problem?

  2. Will i be able to use Looking Glass for my Dell XPS 9570, or have i just waisted a damn long amount of time :-)?

See this github readme.
https://gist.github.com/Misairu-G/616f7b2756c488148b7309addc940b28

  1. You do not need to patch the drivers as in that post. Adding hidden to KVM in VM virsh edit and whatnot takes care of that Issue without the need to patched windows drivers.

  2. Your lspci shows that the 1050 ti is a 3d controller and not a VGA controller. That means that it is mux-less graphics setup, see link above for explanation.

  3. I actually do not know if looking-glass would work, but it might. You need to first get the Nvidia driver successfully installed, with the card not reporting problems.

  4. To have any hope of it working, you will need to give the VM a copy of the 1050ti’s vROM(otherwise known as a vBIOS). With normal video cards, this entails dumping it to a .rom file, then add the path to it in your VM XML. In your case, the 1050ti vROM probably is embedded into your motherboard UEFI, which complicates things.

In the case that the vROM is baked into your laptop’s UEFI, you will need to first get a complete copy of your motherboard’s UEFI, updates are often delta only and not complete, so a complete copy might be hard to get. Next, you will need to rip it apart to get the GPU vROM, then you can take the vROM and compile it with a couple of not really tested patches into custom OVMF ACPI tables. Then you have a hope of getting passthrough working.

Thanks so much for the help. Question though, here’s my lspci -nnv output:

08:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] [1002:67df] (rev e7) (prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Ellesmere [Radeon RX 470/480/570/580] (Nitro+ Radeon RX 580 4GB) [1da2:e366]

and

09:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] [1002:67df] (rev e7) (prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Ellesmere [Radeon RX 470/480/570/580] (Nitro+ Radeon RX 580 4GB) [1da2:e366]

What is the ID number? Is it the first one (1002:67df) or the second one (1da2:e366)?

EDIT: Tried both and no luck, they’re still both using amdgpu. And I don’t have an /etc/mkinitcpio.conf file at all, but reading the Arch Wiki makes it seem like this problem was corrected in 4.19, which I’m running, so I shouldn’t need to do that step, regardless.

Sorry, went to take a short nap earlier, woke up at almost 2am, guess I needed to catch up on some sleep.

Yeah, I think you are facing the same issue I was having with my R9 390, and possibly Vega 64 too, related to the timing of the GPU driver binding that meant I simply had no choice but to blacklist the AMDGPU and Radeon drivers.

Obviously, blacklisting is not an option for you so try manually binding the GPU. You can download the vfio-pci-bind script from this Andre Richter github make sure it is executable and copy it to /usr/local/sbin/

sudo cp -i /home/####/Downloads/vfio-pci-bind.sh /usr/local/sbin/

then run the following in a terminal to see if it works (changing the 4 to the number of the GPU you want to passthorugh from lspci -nnv |less )

sudo vfio-pci-bind 0000:04:00.0 0000:04:00.1

I remember I used to have this line at the top of my old and very first gpu passthrough KVM/Qemu execution script. But I think when I switched from Arch to elementaryOS/Ubuntu I no longer need it and lost my version of the script.

Hopefully this will work and force the GPU and everything in it’s IOMMU group to bind to vfio. I am not sure if you can use it in the virt-manager XML. So you may have to run it in terminal each time before you start your VM. When I first did this, I wasn’t using virt-manger because it kept breaking my VM on almost every other Arch update.

1 Like