VFIO in 2019 -- Pop!_OS How-To (General Guide though) [DRAFT]

A few ideas for improvements on NUMA systems if one want to stay on one NUMA node:

ls-iommu.sh for NUMA

#!/bin/bash

shopt -s nullglob

for d in /sys/kernel/iommu_groups/*/devices/*; do
    n=${d#*/iommu_groups/*}; n=${n%%/*}
    m=$(< $d/numa_node)
    printf 'IOMMU Group %s NUMA node %s PCI ' "$n" "$m"
    lspci -nns "${d##*/}"
done | sort -V

systemd service to statically allocate 1GB pages on specific NUMA (1 in this example):

[Unit]
Description=Allocate 1G hugepages on NUMA node 1
After=syslog.target
Before=libvirtd.service

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/bash -c "/bin/echo 16 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages"
ExecStart=/bin/mkdir /dev/hugepages1G
ExecStart=/bin/mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages1G

[Install]
WantedBy=multi-user.target

Note that the RAM will be reserved and not used by regular applications. Newer tried to allocate hugepages on multiple NUMA nodes bound to different mount points, so it could be limited for use with only 1 node.
libvirt config to use it:

  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB'/>
    </hugepages>
  </memoryBacking>

Also needs a update of libvirts qemu.conf to add the mount point like hugetlbfs_mount = ["/dev/hugepages", "/dev/hugepages1G" ]

2 Likes

It is the group ID that owns the huge pages

great write up! im really enjoying coming back to the forum and seeing this content

Wow that’s a cool guide! I wish I could try but I’m not sure if it will work on my setup since my cpu only has 16 pcie lanes and 2 gpu need 2x8 plus my main drive is nvme ssd on a pcie card…

Using Pop!_OS 19.04, I have a couple of questions / comments:

  • This distro uses systemd-boot in UEFI mode, so what’s the point of using Grub configuration files and commands instead of editing /boot/efi/loader/entries/Pop_OS-current.conf?
  • Having both a Vega64 and an RTX 2080 on the same machine, I had to use the NVIDIA ISO from System76 (otherwise the OS would freeze on boot). To get VFIO working (after blacklisting NVIDIA), I had to $ sudo apt remove nvidia-dkms-418. Do you know a better way to completely uninstall the NVIDIA driver?
  • Finally, Virt-Manager 2.0 works great… but I have no feedback from the Memory consumption graph (no data coming through)

I actually updated the guide last night – I did a fresh install of 19.04 and found that it used systemd-boot. I think that’s true of 18.10 and 19.04 now for pop, but lts 18.04 was still using grub. The guide should work for that case.

I ran into the same issue with removing the nvidia driver on my system (had to do it via ssh because it borked the console for me).

might want to add a section on the use of kernelstub for people on hardware that needs the ACS patch. Ukuu won’t work for them.

1 Like

not just that, but have to change the loader conf to pick from a few different kernels/edit the cli because that’s disabled by default too. le sigh

kinda tempted to make the guide “oh, systemd-boot, poor thing, this series of apt commands will undo that” lol

1 Like

oof. Pop_OS is hitting some not invented here territory

guess that applies for the AGESA 0.0.7.2 Quirk for error 127 as well for things that might need mention in the patching section.

Grub fan here too.

Though my concern is still Apparmor. To even get Looking Glass working I dealt with SELinux on Fedora, but I hear horror stories about Apparmor.

It’s not as bad as all that, but it is more annoying than it not being there in the first place.

Here’s how I’ve configured AppArmor to fix permission issues.
Works great on Pop!_OS 19.04, as well as 18.04 for EVDEV and Looking Glass:

$ sudo vi /etc/apparmor.d/abstractions/libvirt-qemu

# for usb access
   /dev/bus/usb/** rw,
   /etc/udev/udev.conf r,
   /sys/bus/ r,
   /sys/class/ r,
   /run/udev/data/* rw,
   /dev/input/* rw,

# Looking Glass
   /dev/shm/looking-glass rw,

$ sudo systemctl restart apparmor

2 Likes

Any special permissions for the ISOs or qcow2s you’re reading from? I know there are issues with that on SELinux.

Nope, nothing on that side using Ubuntu / AppArmor

HI great guide thanks, just got an issue on my system 2950x rx590 and rx570

I get this error when updating initramfs

/etc/initramfs-tools/scripts/init-top/bind_vfio.sh: 5: /etc/initramfs-tools/scripts/init-top/bind_vfio.sh: cannot create /sys/bus/pci/devices/$:DEVS/driver_override: Directory nonexistent

I check anfd the directory exists

here is my script

!/bin/sh
PREREQS=""
DEVS=“0000:42:00.0 0000:42:00.1”
for DEV in $DEVS;
do echo “vfio-pci” > /sys/bus/pci/devices/$DEV/driver_override
done

modprobe -i vfio-pci

and here is my lspci -vnn
42:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X] [1002:67df] (rev e1) (prog-if 00 [VGA controller])
Subsystem: Sapphire Technology Limited Nitro+ Radeon RX 580 4GB [1da2:e366]
Flags: bus master, fast devsel, latency 0, IRQ 136
Memory at 80000000 (64-bit, prefetchable) [size=256M]
Memory at 90000000 (64-bit, prefetchable) [size=2M]
I/O ports at 4000 [size=256]
Memory at 9fe00000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at 000c0000 [disabled] [size=128K]
Capabilities:
Kernel driver in use: amdgpu
Kernel modules: amdgpu

42:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 580] [1002:aaf0]
Subsystem: Sapphire Technology Limited Ellesmere [Radeon RX 570/580] [1da2:aaf0]
Flags: bus master, fast devsel, latency 0, IRQ 164
Memory at 9fe60000 (64-bit, non-prefetchable) [size=16K]
Capabilities:
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel

ls /sys/bus/pci/devices/

? Is 0000 the right prefix? You have threadripper so it has multiple busses :smiley:

I think it is I checked like you advice in /sys/bus/pci/devices

Hmm this line sys/bus/pci/devices/$:DEVS/driver_override

Makes me think typo in the script then? Maybe? I’m out at the moment may be looking over something obvious

What’s with the rogue colon? Should be $DEV not $:DEVS

I checked the script again no rogue collon there and the audio device is listing as using kernel driver and modules snd_hda_intel

I copied and pasted the script posted in the discution and it all went well
thanks for the support as always (can’t promise I won’t be back with another issue in a moment though lol ) And keep up the great work !