Ubuntu 17.04 -- VFIO PCIe Passthrough & Kernel Update (4.14-rc1)

Mobile CPUs typically don’t have the necessary Virtualisation extensions. Check that first because chances are that will end your endeavour prematurely.

But yes it should work on a single display as far as I understood from the previous videos.

1 Like

Looks like it does:

I was considering going for this laptop in particular:

Yeah the CPU seems to support it as it seems. Question is then if it can be enabled in the UEFI, if it is then you’re good to go.

That’s a good question, I have no idea myself as far as this laptop goes.

Lets get that NPT bug bounty fund raiser started! :grinning:

1 Like

Hi Wendel,

YOU ARE MY HERO!!!

I managed to get GPU passthrough running on Ubuntu for around 3 weeks, thanks to your orignial articel on IOMMU & Virtualization with Ryzen, but I have some minor issues. Due the leak of Time I could’nt post an update on the original forum thread. But I think with the guidance of your new HowTo here I can get rid of the issues :wink:

Anyway, I have some notes to your guide here:

Step 2)
Just informational: May you don’t wish to register a kernel PPA? You could install the newer kernel easy manually:

  1. Go to http://kernel.ubuntu.com/~kernel-ppa/mainline/

  2. Choose your desired kernel version for, instance 4.14-rc2 http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.14-rc2/

  3. Download linux-header, linux-header-generic and linux-image package. In this example the following:

    • linux-headers-4.14.0-041400rc2_4.14.0-041400rc2.201709242031_all.deb
    • linux-headers-4.14.0-041400rc2-generic_4.14.0-041400rc2.201709242031_amd64.deb
    • linux-image-4.14.0-041400rc2-generic_4.14.0-041400rc2.201709242031_amd64.deb
  4. open a command line and go to the download directory and install the packages via:

    sudo dpkg -i linux-*4.14.0-041400rc2*.deb

Step 4) - Modules
Question: Does the redundant config of the vfio_pci stuff in the /etc/modprobe.d and /etc/initramfs -tools/modules files help to load the right module at runtime? Why I ask: I just configured the vfio-stuff modules in the /etc/modules file and passed the devices IDs for pci_stub or vfio_pci over the kernel parameters in the /etc/default/grub file.

Step 4) - before install qemu-kvm and virt-manager
Hmmm I missed the update of the grub-config and the initramfs in your guide. Before you proceed with the VM stuff you should issue the following commands:
sudo grub-mkconfig; sudo update-grub; sudo update-initramfs -k all -c

And now why you are my hero:

“Modify the
/etc/initramfs-tools/modules
file and add the following lines:
softdep amdgpu pre: vfio vfio_pci…”

I’m looking for weeks for such an parameter XD I stucked to blacklist the amdgpu in the /etc/modprobe.d/blacklist.conf and loaded it via /etc/modules as last module. Your solution is way robust! Great! :smiley:

“Edit the /etc/apparmor.d/abstractions/libvirt-qemu…”

Woahh… I missed 4 of the 5 privileges -.- I’m dumb I ended up to remove all USB devices before I start my Windows VM and had to add it while the VM boots. Thanks again!

I presented my findings about Windows virtualization and IOMMU this monday on a department event in my company. I managend to do some benchmarks with Unigine Valley and Superposition. Here are my results on a slidly overclocked Sapphire RX 560 Pulse 4GB graphics card:

Native							                 Virtual
Unigine Superpostion - High preset without AA - 1080p [Points]
2579 (FPS: min 15.9; max 23.3; avg 19.3) 	     2466 (FPS: min 10.7; max 23.1; avg 18.5) ​

Unigine Valley ​ - High preset without AA - 1080p [Points]​
2248 (FPS: min 27.0; max 100.6; avg 53.7) 		 2051 (FPS: min 10.9; max 101.9; avg 49.0) ​

Some of my coworkers now want to setup a full virtulized Windows too! :wink:

Thanks for your awesome work here! :smiley:

Edit @ 2017-10-05:
Hmmmm … I figured out: if I change the Performance Settings of a Virtural Disk for instance in QCOW2 format from Hypervisor standard to Buffer mode: DirectSync and EA-Mode to: Threads
I could improve my Valley Performance around 1 FPS or a bit more in avg. resulting in a Score of 2100 up to 2120 points. Even better: the benchmark sequence was noticeable less choppy.

I tested XEN virtualization 4 weeks ago but I didn’t got the pci_back module working right to claim my Sapphire RX 560.
May I’ll manage it to test it again this weekend?.. :wink:

3 Likes

I appreciate the time spent to write this up, as well as the videos you have made.

I had to replace amdgpu with nvidiafb when I modified "/etc/initramfs-tools/modules"
I found it was nvidiafb by checking the kernel module listed when using lspci -nnv |less

I followed this guide to the tee, but I selected the latest kernel which is v4.14-rc4, this resulted in it not using the kernel driver vfio-pci
once I installed v4.14-rc3 it is now loading the vfio-pci driver.

However both rc2, rc3, and rc4 my menu bar takes a couple of minutes before it loads, and I have to use alt+f2 and search for and launch from there, rc1 worked without issue.
I am running ubuntu 17.04 with plasma-desktop (kde5)

so I am now at the fun part where I get to test out the VM, however I cannot until tomorrow when my displayport cable arrives.

2 Likes

Thank you for this! With this i was finally able to get everything to work perfectly. No other resource out there seem to offer these 2 details:

softdep amdgpu pre: vfio vfio_pci

and

AppArmor

Thanks again!

1 Like

One thing to note about AppArmour is that you can actually tell it to ‘aa-complain /etc/apparmor.d/libvirtd/*’ and it will whitelist the entire libvirt, including the dynamic files that are created there. Something you may want to add, @wendell to your guide. I tested it and it reports ALLOWED in dmesg, so we can keep apparmour going. I’m still suffering an issue with an error that may still be related. Not sure if you can give insight on it.Screenshot-20180717171947-561x383

EDIT: dmesg output-

[63433.893882] audit: type=1400 audit(1531873183.044:8024): apparmor=“ALLOWED” operation=“file_inherit” profile="/usr/sbin/libvirtd//null-libvirt-dbf05741-fb3e-4564-ba55-1a889636b7dc//null-/usr/bin/kvm" name="/dev/net/tun" pid=56568 comm=“kvm-spice” requested_mask=“wr” denied_mask=“wr” fsuid=1000 ouid=0

Second EDIT: Another thing to note about AppArmor, the command to whitelist is ‘sudo ln -s /etc/apparmor.d/libivrt/* /etc/apparmor.d/disable/’ but you want to carefully time it with entities like Looking Glass, since libvirt will delete the profiles the moment the VM stops. So you want to hit the command the moment you start the VM, so the symlink is retained on everything that was on there at the given moment. This allows you to keep using AppArmor. I’m still investigating the bootup issues, but making progress as I go

Tried this guide on my Ryzen 1700 and Vega 64, and it works when the V64 is not the first GPU on my board. When I move it to the first PCIe slot to have the full x16 bandwidth, the vm will refuse to boot. Just freezes. I know that the location ids for the PCIe passthrough changes, so I made sure to make sure the V64 is still on the right id. Checked the other devices as well and everything is correct. So why wouldnt the vm work when moving the hardware?

Change uefi to init other gfx card first?

On ryzen 1700 the first slot is x8 if the second slot is occupied. It’s x16/x0 or x8/x8

If you want the board to boot from graphics off the chipsets lanes, you need a board that lets you pick slot init order. Like gigabyte.

I did, Ubuntu 18.04 boots to the secondary card and displays.

Does anything display at all on the card to be passed through?

No, nothing displays on the V64. Removing the gpu out of the devices causes the vm to boot normally.

Adding a win10 boot iso to see if that will boot, nothing will show on gpu. Adding a spice device for some sort of output shows that the iso bluescreens when trying to boot. When trying to boot to already installed win10 drive, the dot circle will turn two times and freeze.

Also, the secondary card is in the last slot of my Asus Prime x370-Pro. There are three x16 sized slots, 1 x16, 1 x8, and I think the last is 1 x4. Really need to lookup if that last slot does the same, but right now I want to try getting the first slot working.

Do remember that that last slot is PCIe 2.0 x4, so from a bandwidth perspective that’s PCIe 3.0 x2. Some cards refuse to boot from that amount of bandwidth.

Why not just use the x8/x8 configuration though? Current gen GPUs don’t have enough bandwidth to be throttled by that anyway.

/edit
That slot also shares bandwidth with both the x1-slots, so if you have any of those populated you will only get 2 PCIe 2.0 lanes, equalling 1 PCIe 3.0 lane in bandwidth.

That slot is being used by “VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Cedar [Radeon HD 5000/6000/7350/8350 Series]”
for the Ubuntu install. I do plan on getting a more beefy card for Ubuntu, and all this trashing around would be for naught, but would like to figure out how to do it anyways. It is purely for giving output, which it does on that slot. Thats all I need from it.

Am not using any other PCIe cards.

I wish to figure out how to get it to work because I will be making a computer for the parents that will be two vms instead of one. The ubuntu install will be headless, while they will only see the win10 vms. I will most likely get two RX 560s to do this. I can passthrough anything but whatever is in the first slot.

Decided to update to kernel 4.17.9 from 4.17.8 to see if anything changes, and I see this during the install:

W: Possible missing firmware /lib/firmware/amdgpu/vega12_gpu_info.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_asd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_sos.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_rlc.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_mec2.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_mec.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_me.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_pfp.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_ce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_sdma1.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_sdma.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_uvd.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_vce.bin for module amdgpu
W: Possible missing firmware /lib/firmware/amdgpu/vega12_smc.bin for module amdgpu


Ok, interesting note, I removed the V64 and added the spice gpu to boot the win10 vm. I changed the boot of the vm to safe mode networking with base graphics, shutdown and readded the V64. The vm then boots into safe mode and shows the V64 card in device manager. So Im wondering if it is an issue with win10 drivers instead of kvm? If that is the case, why wouldnt it display anything on the V64?


Ok, so uninstalling the AMD Radeon drivers in win10 allows the vm to boot with the V64 attached, but now I am getting this in the device manager:

Windows has stopped this device because it has reported problems. (Code 43)

Isnt this something that only affects Nvidia cards?


Trying to reinstall the V64 drivers causes the vm to hard lock.

Quick question. Is the guide any different if I am using an RX 560 for Linux and a GTX 780 Ti for Windows? I’m just not sure about the part of the guide where you are modifying module files and initramfs.

No, the only thing different is that you must use the ids displayed by your 780 ti for passthrough.