VFIO in 2019 -- Pop!_OS How-To (General Guide though) [DRAFT]

So, at first I had my config set likes yours until I read the arch wiki a bit. Now I’ve done the following (sorry this is copied from my notes, I tried to clean it up) -

  1. CPU Pinning - If you have a TR or EPYC processor (or even if you don’t and like pretty pictures) you should -
apt install hwloc

and run lstopo which will help figure out which threads have local access to PCI devices and RAM in a numa system.

  1. virsh edit win10
  2. locate the line
     <vcpu placement='static'>8</vcpu>
  (the number might  be different of course)
  1. For testing I was using a 6 core so I left 2 cores, or 4 threads to the host. Use lscpu -e or lstopo to confirm which cpus you want the guest to use. Here in the example Cpu 0-3 is ommitted and left to the host and 4-11 is pinned to the vcpu’s 0-7. Pinning 1 vcpu to 1 cpu allows you to make the best use of cache, otherwise when the vcpu process switches to another cpu it will have to access RAM again. Iothreadpin is only used if you are using virtio-scsi or virtio-blk devices which weren’t available in virt-manager for me. Emulatorpin afaik refers to QEMU so this would be setting it to one of the host cpus to avoid qemu taking processing time from the guest? At least I think that’s how it works, haven’t confirmed. Add the following lines beneath the line in the step above, changing the values to suit your case.
<iothreads>2</iothreads>
<cputune>
  <vcpupin vcpu='0' cpuset='4'/>
  <vcpupin vcpu='1' cpuset='5'/>
  <vcpupin vcpu='2' cpuset='6'/>
  <vcpupin vcpu='3' cpuset='7'/>
  <vcpupin vcpu='4' cpuset='8'/>
  <vcpupin vcpu='5' cpuset='9'/>
  <vcpupin vcpu='6' cpuset='10'/>
  <vcpupin vcpu='7' cpuset='11'/>
  <emulatorpin cpuset='0-3'/>
  <iothreadpin iothread='1' cpuset='0-1'/>
  <iothreadpin iothread='2' cpuset='2-3'/>
</cputune>

Edit Oops, realized I missed a couple things.

Hopefully that is somewhat clear. Not really sure how much it helps but I figured it’s probably better than letting the vcpu’s just roam amoungst a pool of assigned cpus.

It was just an idea. I was thinking either pass-though the slot or pass-though the controller on the card thinking maybe it’d show up in it’s own IOMMU group because it an independent controller on a AIC. So, no i didn’t mean a hub but perhaps that could work too, maybe. Really these are all things people should test.

Yeah so the VM is just outright skewing the results. That means about the only accurate measurement would be a real world test.

Right now I’m trying to figure out how to tell Looking Glass which of my two monitors I want it to go full screen on with the -F argument. It always fullscreens on my primary but I want it on my secondary display. Still haven’t figured out how to tell it which monitor to use.

I’m actually not sure how CPU cores behave in a non-virtualized environment. Weather they keep their tasks pinned to specific cores/threads & memory or if they swap around like how I configured my VM. If the former I’ll switch my CPU config for yours if it means a little (or substantial) more performance for certain tasks.

Yeah, I was getting some wild variance that probably had to do with caching or optimizations by ZFS on the backend. /shrug. Real world is the only thing that matters at the end of the day.

Sorry never used it, so I’m not sure.

In hindsight I should have just linked this in the first place. As my notes don’t give the full picture -
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Performance_tuning

Last thing I’m trying to figure out is the Virtio-Scsi controller. I can create it in virt-manager but I’m stuck on how (or if) I can get my virtio disks to hang off said Virtio-Scsi controller. The CD-ROM that is created by default seems to be attached to the SATA controller so I was trying to mimic that by editing the xml directly… no dice. For one a Virtio disk cannot be of type ‘drive’.

I’m starting to feel like I really need to grab a newer virt-manager in the hopes I can do what I want. The goal is to be able to set the IO Mode on my Virtio disks to ‘Thread’ and then pin the thread in cputune. This is supposed to help the bottleneck of qcow2 expansion on writes as it adds some concurrency.

I can agree to that.

I THINK I found a solution. One that may magically stop being a solution so don’t immediately take it as one:

I replaced the
-F Borderless fullscreen mode
arguement for
-d Borderless mode
and it just magically fullscreens on my 2nd display like I wanted…no idea why but I’m not going to complain if it does this consistently.

I don’t know what function these two lines serve:

<emulatorpin cpuset='0-1'/>
<iothreadpin iothread='1' cpuset='0-1'/>

If I can figure out how to edit this for my application I’ll use it. I’m working with 16 cores/32 threads passing though 8/16 to the VM. Would I have to edit the above to reflect what threads I’ve left to the host? Would it stay the same regardless of core count?

I don’t know how much performance you’ll gain going the iSCSI route. If you want to go though that much hassle you might as well take wendell’s advice and just pass-though an entire disk. Skip the ZFS pool and give it a SATA boot SSD.

Well, my .XML file now:

<vcpu placement='static'>16</vcpu>
  <iothreads>1</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='8'/>
    <vcpupin vcpu='1' cpuset='9'/>
    <vcpupin vcpu='2' cpuset='10'/>
    <vcpupin vcpu='3' cpuset='11'/>
    <vcpupin vcpu='4' cpuset='12'/>
    <vcpupin vcpu='5' cpuset='13'/>
    <vcpupin vcpu='6' cpuset='14'/>
    <vcpupin vcpu='7' cpuset='15'/>
    <vcpupin vcpu='8' cpuset='24'/>
    <vcpupin vcpu='9' cpuset='25'/>
    <vcpupin vcpu='10' cpuset='26'/>
    <vcpupin vcpu='11' cpuset='27'/>
    <vcpupin vcpu='12' cpuset='28'/>
    <vcpupin vcpu='13' cpuset='29'/>
    <vcpupin vcpu='14' cpuset='30'/>
    <vcpupin vcpu='15' cpuset='31'/>
    <emulatorpin cpuset='0-1'/>
    <iothreadpin iothread='1' cpuset='0-1'/>

And the output of vcpupin:

:~$ virsh vcpupin win10prox64 
 VCPU   CPU Affinity
----------------------
 0      8
 1      9
 2      10
 3      11
 4      12
 5      13
 6      14
 7      15
 8      24
 9      25
 10     26
 11     27
 12     28
 13     29
 14     30
 15     31

While setting it up the first way that I found the thought did cross my mind if assigning specific threads/cores to each vcpu would in any way help more than creating a pool that all the vcpu’s share.

This may be unrelated but I’ve just noticed Looking Glass is a little less choppy when moving windows around inside it. Don’t know if pinning the vcpu’s to independent threads had anything to do with that but it’s great to see.

Hopefully it will stay that way.

Hmm, It’s been a long time since I’ve ran a multi monitor setup as I’ve had two or more PC’s running at my desk for various reasons. I seem to recall there was utilities available from both GPU manufacturers to make it easy to move or pin application windows between displays in windows. There’s gotta be something similar for Linux… Well maybe you can get an idea from here -
https://wiki.archlinux.org/index.php/Multihead#Application_support or
https://wiki.archlinux.org/index.php/Xrandr#Manage_2-monitors though the later is really for just dynamic resolution/orientation changes.

Afaik emulatorpin is for pinning the thread(s) used by QEMU to emulate PC hardware for your VM. Pinning it to cores your VM isn’t using may help when it’s under heavy load as there won’t be any contention between the two? I’m not sure if it is more than one thread so I kept the Arch wiki examples setup that lets it roam a few cores.

I’m surprised it would load with iothreadpin as I thought it was only needed when using a virtio-scsi controller with drives set to ‘thread’ IO mode. Though it may be treated as more of a default to observe when/if qemu spawns such a thread. Anyways it’s probably not needed for your setup.

Oh, it’s not iSCSI I’m trying to do. Just trying to get my local qcow2’s setup in my VM using a method that, at least in theory, will give me gains in write speed. It’s not the end of the world if I can’t get it setup. I just found several blog posts suggesting to use that io mode but I haven’t figured out how to make that happen as doing so directly to the virtio drives gives me an error.

Edit Ok, I stand corrected. When I originally tried setting up threading on my virtio drives I set my caching at the same time. I swear I tried the settings individually to determine which was causing the error and had decided neither were supported. Turns out it was just the caching option. So now I have my iothreads setup, booted my VM, started a benchmark and even reads seemed to be improved ~20%. Then it got to writing and my whole machine (including host) locked up. Hmmm… So either I got something set strange or that hardware issue isn’t resolved after all…

Cool :slight_smile:

Now that I’ve fixed a hardware issue I hadn’t pinned down yet I’ll probably switch over to this VFIO machine and get some real world impressions using it daily. So far I really feel like Win10 runs better in a VM, but to be fair it’s been almost a year since I gave up running it on baremetal and I did have a MS background process that was miss behaving on my machine. So it’s completely subjective.

I think the issue is that the Looking Glass application lacks the feature of choosing which display to go full-screen on (when using -F). It’s set to default to whichever is the primary. Now I could trick Linux into thinking my secondary is my primary which would in turn fool Looking Glass but boarderless fullscreen (-d) looks identical and it seems as if it opens wherever the application window was last opened which basically gets the same job done.

I might run some benchmarks both with and without those lines. See if it makes any real difference.

Well, keep at it. If I got it going so can you. I have the worst of luck whenever I want to build things like this. I will run into every single error nobody else is experiencing.

For years I have tried to diagnose why my Windows desktop would crash with WHEA_UNCORRECTABLE_ERROR. Tried replacing almost every piece of hardware. Re-installing the OS. Etc, never fixed it for long. Now Linux has been the most stable OS running on my computer period. And Windows seems to behave much better in a VM but we will see how true that holds as we progress. I was trying to install some drivers and it crashed. Something about a THREAD_EXCEPTION so we’re off to a rocky start.

A little bit of good news. I discovered the USB3.0 ports just right of the ones connected to the controller I passed though are in themselves on a different controller. This means I don’t have to use my jank USB2.0 adapter cable. Sweet. That also means my VM automatically has four 3.0 ports. Even better.

Probably some xorg.conf tweaking can be done there? Or maybe easier would be this? -
https://wiki.archlinux.org/index.php/Multihead#Separate_screens

Well after hours of burn in I’m pretty confident my VFIO system is both stable and able to save bios settings without power cycling without end. Previously, I found a way to bypass the second problem and as the system seemed stable otherwise I ignored it. I decided to swap my X470 with a X570 that someone just had laying around. Well, long story short it was indeed the board at fault and now everything is peachy. Except for the part that I can’t passthru my GPU now. Two steps forward, one step back.

I’ve already tried the latest MB bios and I’m uncertain what to do on the Kernel side. Mostly I’m not sure I can even get a newer kernel then 5.0 without cloning ubuntu’s repo and hoping I don’t break any of Pop_OS’ tools by installing a custom kernel.

I even considered testing Fedora but it’s my understanding ZFS support isn’t great. Not to say you can’t. It just takes extra work and I doubt they’ll be supporting zfs on root within the installer because of their stance on CDDL.

Surprisingly I haven’t had any major glitches with Win10 itself. Even when I was accidentally running the VM on one thread… cough It was slow as hell of course, but I didn’t actually crash. Go figure.

Yeah on my new board it looked like I might be able to isolate a USB port but I’m not so sure because it looks like on two USB controller devices with several functions sprinkled through a few IOMMU groups. I didn’t dig into it but it’s probably PCI bridge again? At any rate I wasn’t sure if I could get it to work so I just slapped all my guest USB devices on a hub and stuck the hub into a USB port on the back of the machine. I considered trying to pass the hub but I did the individual devices first. I’ll experiment with that later.

I could just leave a boarder on the Looking Glass window which would let me click-drag it around but them you get into pixel compression and it just looks terrible so full screen boarder-less is the way to go. I’m unsure how what you linked would tell Looking Glass which display to use. Really all that is needed is an argument like --set-fullscreen-display= 1 but I don’t see any information saying that exists.

Did you enable it in the BIOS? Does your CPU have more than one NUMA Node? Might have the same issue I ran into. Did you set intel_iommu=on (or amd_iommu=on)?

I’m on kernel 5.0.0-13 (I think. Off the top of my head anyways). So I don’t think you need any more updating there.

When virt-manager automatically set my CPU topology to 16 one core CPUs that was a bit of an issue for win10. I noticed it way god awful slow and didn’t know why.

Copied from wiki:
With this configuration, it is not possible to move windows between screens, apart from a few special programs like GIMP and Emacs which have multi-screen support. For most programs you must change the DISPLAY environment variable when launching to have the program appear on another screen:

$ DISPLAY=:0.1 command-to-run &

Yeah it’s enabled. It’s not a TR so I don’t have that issue. Only moved from X470 to X570 and my linux install and VM config stayed the same. Just had to update the identifying bits in my config (device id’s and such). So it was pretty much already setup.

I’ve since installed Kernel 5.25 but I’ve since realized that probably isn’t going to work for me as it seems doubtful I can get ZFS to work on kernel 5.2. At least out of the box. So at the moment I’m considering loading up Pop OS 18.04 on the chance it’s true that AMD contributed some kernel patches there for the new hardware. Though I haven’t confirmed that. Just running out of options.

Is that just an automatically compatible argument for Looking Glass or is that something Looking Glass would have to support? Also if it locks the display away from Linux then it’s not usable for my application. I need the ability to move Linux windows over-top of Looking Glass.

While I was researching setting up QEMU it was stated that IOMMU support was brought in way back in kernel 3.2-something, something. If you’re already on 5 I’m pretty certain updating further isn’t going to do anything for you in terms of GPU pass-though. I’m on Ubuntu 19.04. The guide worked fairly well. So if you plan on switching OS’s might consider that. Or Linux Mint.

No idea. Was giving you reading material on possible solutions.

My problem is, at the root, a hardware support issue. I’m simply trying, mostly in vain, to find a kernel that will support my hardware in VFIO while maintaining support for ZFS. I’m 99% sure this is a bios issue as I’ve just got done confirming 18.04 w/kernel 4.15 has the same error when starting a VM with a GPU passed thru. As far as going with a newer kernel, that is only in the hope that the hardware support for the X570 chipset is better. But afaik my only other choice atm is 5.1 because ZFS doesn’t yet support 5.2. If it did I’d try 5.25 because that apparently fixes the Virtualization regression that was plaguing 5.2. Never mind 5.3 which is still a RC. That all said I’m back on 19.04 and will probably wait for the new bios AMD has been promising.

Have you had any luck attempting to compile a kernel for Pop with that patch applied? I’m running into dependency issues because of certain dependencies and system76 version of those packages not mingling well together.

what error are you getting? is it during compile, or errors applying the patch?

An error occurs during compile. Since I don’t see a way to get system76 kernel source code, I try to compile the Ubuntu dingo kernel via this guide: https://wiki.ubuntu.com/Kernel/BuildYourOwnKernel

The sudo apt-get build-dep linux-image-$(uname -r) command does not work because of a dependency issue: “builddeps:linux-signed : Depends: linux-libc-dev (>= 5.0.0-23.24) but 5.0.0-21.22+system76 is to be installed”

So I decide to continue anyway by installing the rest of the packages in the guide, clone, run the commands… It always stops pretty far in, saying:

/usr/bin/ld: cannot find -liberty
collect2: error: ld returned 1 exit status
make[3]: *** [Makefile.perf:569: perf] Error 1
make[2]: *** [Makefile.perf:215: sub-make] Error 2
make[1]: *** [Makefile:70: all] Error 2
make[1]: Leaving directory ‘/home/dominino/ubuntu-disco/debian/build/tools-perarch/tools/perf’
make: *** [debian/rules.d/2-binary-arch.mk:642: /home/dominino/ubuntu-disco/debian/stamps/stamp-build-perarch] Error 2

$ sudo apt-get install libiberty-dev

?

Good point. I have one for now (-d) if it stops working as one I’ll have to explore another.

What motherboard is it?

Eh, patch? Is there a patch for the error “unable to power on device, stuck in D3”? Did I fail at google? I hadn’t come across one.

It got me farther, now I get install: cannot stat ‘/home/dominino/ubuntu-disco/debian/build/tools-perarch/tools/perf/libperf-jvmti.so’: No such file or directory
make: *** [debian/rules.d/2-binary-arch.mk:685: install-perarch] Error 1

Update: Tried libperformance-dev and cleaning then rebuilding, same error. I then noticed that there were packages with my kernel version and modules so I installed those… booted ok, but now when I try to install any other packages, there is always a conflict with any package I try to install.

Last Update: Instead of trying to muck with the kernel any longer, I followed this youtube guide to downgrade BIOS, bit of a leap of faith because NONE of my checks passed but I proceeded anyway, and et voila! PCI Passthrough works: https://www.youtube.com/watch?v=ZzqwjVDKAnU

Downgraded AGESA/Combo PI(?) worked.

I was referring to the pci error 127 patch thing