VFIO in 2019 -- Pop!_OS How-To (General Guide though) [DRAFT]

@wendell I have a System76 Oryx pro (2018) laptop.
Intel® Core™ i7-8750H CPU @ 2.20GHz × 12
32 GB RAM
Intel® UHD Graphics 630 (Coffeelake 3x8 GT2)
GeForce GTX 1070 Ti

with Pop!_OS and Win10 in dual boot

Optimus laptops are going to be problematic for passthrough for a number of reasons. Might be able to get it working if you have a tb3 port or your mpcie slot isn’t whitelisted but pretty much no option is ideal on these systems.

lightdm is the only reason I’m kinda holding back on Pop OS because it’s required.

I’m a SDDM guy, and would like to switch it to KDE, but knowing switching DEs comes with the risk of breaking everything on Pop OS isn’t comforting.

for what it’s worth I’ve not had trouble on lightdm in solus, which uses lightdm by default

Does somebody wants to explain what:

vm.hugetlb_shm_group=48

manages in the sysctl.conf file? Should I change the value if I want to use more than 20GB of memory in my vm-guest?
In this example it is 48 but the example in the ‘quicky how to’ is 32. And in the ‘More info here page’ the setting is not mentioned…

A few ideas for improvements on NUMA systems if one want to stay on one NUMA node:

ls-iommu.sh for NUMA

#!/bin/bash

shopt -s nullglob

for d in /sys/kernel/iommu_groups/*/devices/*; do
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    m=$(< $d/numa_node)
    printf 'IOMMU Group %s NUMA node %s PCI ' "$n" "$m"
    lspci -nns "${d##*/}"
done | sort -V

systemd service to statically allocate 1GB pages on specific NUMA (1 in this example):

[Unit]
Description=Allocate 1G hugepages on NUMA node 1
After=syslog.target
Before=libvirtd.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -c "/bin/echo 16 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages"
ExecStart=/bin/mkdir /dev/hugepages1G
ExecStart=/bin/mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages1G

[Install]
WantedBy=multi-user.target

Note that the RAM will be reserved and not used by regular applications. Newer tried to allocate hugepages on multiple NUMA nodes bound to different mount points, so it could be limited for use with only 1 node.
libvirt config to use it:

  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB'/>
    </hugepages>
  </memoryBacking>

Also needs a update of libvirts qemu.conf to add the mount point like hugetlbfs_mount = ["/dev/hugepages", "/dev/hugepages1G" ]

2 Likes

It is the group ID that owns the huge pages

great write up! im really enjoying coming back to the forum and seeing this content

Wow that’s a cool guide! I wish I could try but I’m not sure if it will work on my setup since my cpu only has 16 pcie lanes and 2 gpu need 2x8 plus my main drive is nvme ssd on a pcie card…

Using Pop!_OS 19.04, I have a couple of questions / comments:

  • This distro uses systemd-boot in UEFI mode, so what’s the point of using Grub configuration files and commands instead of editing /boot/efi/loader/entries/Pop_OS-current.conf?
  • Having both a Vega64 and an RTX 2080 on the same machine, I had to use the NVIDIA ISO from System76 (otherwise the OS would freeze on boot). To get VFIO working (after blacklisting NVIDIA), I had to $ sudo apt remove nvidia-dkms-418. Do you know a better way to completely uninstall the NVIDIA driver?
  • Finally, Virt-Manager 2.0 works great… but I have no feedback from the Memory consumption graph (no data coming through)

I actually updated the guide last night – I did a fresh install of 19.04 and found that it used systemd-boot. I think that’s true of 18.10 and 19.04 now for pop, but lts 18.04 was still using grub. The guide should work for that case.

I ran into the same issue with removing the nvidia driver on my system (had to do it via ssh because it borked the console for me).

might want to add a section on the use of kernelstub for people on hardware that needs the ACS patch. Ukuu won’t work for them.

1 Like

not just that, but have to change the loader conf to pick from a few different kernels/edit the cli because that’s disabled by default too. le sigh

kinda tempted to make the guide “oh, systemd-boot, poor thing, this series of apt commands will undo that” lol

1 Like

oof. Pop_OS is hitting some not invented here territory

guess that applies for the AGESA 0.0.7.2 Quirk for error 127 as well for things that might need mention in the patching section.

Grub fan here too.

Though my concern is still Apparmor. To even get Looking Glass working I dealt with SELinux on Fedora, but I hear horror stories about Apparmor.

It’s not as bad as all that, but it is more annoying than it not being there in the first place.

Here’s how I’ve configured AppArmor to fix permission issues.
Works great on Pop!_OS 19.04, as well as 18.04 for EVDEV and Looking Glass:

$ sudo vi /etc/apparmor.d/abstractions/libvirt-qemu

# for usb access
   /dev/bus/usb/** rw,
   /etc/udev/udev.conf r,
   /sys/bus/ r,
   /sys/class/ r,
   /run/udev/data/* rw,
   /dev/input/* rw,

# Looking Glass
   /dev/shm/looking-glass rw,

$ sudo systemctl restart apparmor

2 Likes

Any special permissions for the ISOs or qcow2s you’re reading from? I know there are issues with that on SELinux.

Nope, nothing on that side using Ubuntu / AppArmor

HI great guide thanks, just got an issue on my system 2950x rx590 and rx570

I get this error when updating initramfs

/etc/initramfs-tools/scripts/init-top/bind_vfio.sh: 5: /etc/initramfs-tools/scripts/init-top/bind_vfio.sh: cannot create /sys/bus/pci/devices/$:DEVS/driver_override: Directory nonexistent

I check anfd the directory exists

here is my script

!/bin/sh
PREREQS=""
DEVS=“0000:42:00.0 0000:42:00.1”
for DEV in $DEVS;
do echo “vfio-pci” > /sys/bus/pci/devices/$DEV/driver_override
done

modprobe -i vfio-pci

and here is my lspci -vnn
42:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] [1002:67df] (rev e1) (prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Nitro+ Radeon RX 580 4GB [1da2:e366]
Flags: bus master, fast devsel, latency 0, IRQ 136
Memory at 80000000 (64-bit, prefetchable) [size=256M]
Memory at 90000000 (64-bit, prefetchable) [size=2M]
I/O ports at 4000 [size=256]
Memory at 9fe00000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities:
Kernel driver in use: amdgpu
Kernel modules: amdgpu

42:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] [1002:aaf0]
Subsystem: Sapphire Technology Limited Ellesmere [Radeon RX 570/580] [1da2:aaf0]
Flags: bus master, fast devsel, latency 0, IRQ 164
Memory at 9fe60000 (64-bit, non-prefetchable) [size=16K]
Capabilities:
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel