VFIO in 2019 -- Pop!_OS How-To (General Guide though) [DRAFT]

Windows7ge · July 30, 2019, 5:22pm

The output of that command:

AnonHugePages:         0 kB
ShmemHugePages:        0 kB
HugePages_Total:   16384
HugePages_Free:    16384
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        33554432 kB

Are you referring to the hint? When I checked the /etc/sysctl.conf file neither line starting with vm. were there so I added them. Outside of that I have no idea.

Also would you happen to know how to actually tie the CPU cores? He doesn’t explain it.

Novastark · July 30, 2019, 5:55pm

Was just looking at this link Wendell had and it looks like you need to make some changes to fstab and such? Dunno if you caught that -

Windows7ge · July 30, 2019, 6:15pm

He did mention that but he said it was some type of optional. So I did a little research and rechecked both of the files for hugepages:

cat /sys/kernel/mm/transparent_hugepage/defrag
cat /sys/kernel/mm/transparent_hugepage/enabled

Even though I edited them they had somehow reset to defaults ¯\_(ツ)_/¯ after going over them again and editing my VM’s .XML file it took the <memoryBacking><hugepages/></memoryBacking> edit. So it’s working.

Now gotta figure out how to pin CPU cores. I got lstopo installed but it’s not as clear an image as wendell’s. I should have a NUMA Node #0 & NUMA Node #1 but it’s just showing up as one large Package so I don’t know which cores/threads are the ones I’m suppose to use.

wendell · July 30, 2019, 6:32pm

If you have what you did with the mountpoints that might be helpful. I am running a script that automounts things and I think it knows about /hugepages because I was surprised I didn’t have to do anything once enabled. My setup is slightly custom so if you had to do extra steps, or something is unclear, clarify for me and I’ll amend the guide

Windows7ge · July 30, 2019, 7:26pm

The guide overall is perfectly fine. I expected to run into hiccups as I’m using a different OS (Ubuntu 19.04) and hardware so I knew I’d have to adapt some of the finer details to suit my setup.

Next issue, I’m using a 1950X so I do have 2 NUMA Nodes but Linux is reporting it as 1 NUMA Node with 32 threads. Both lstopo and lscpu show only one. Don’t know if that’s a BIOS setting issue or what so I don’t have any way of knowing (to my knowledge) which cores are closest to my GPU.

As for how to tie them down Google gave some answers:

# virsh vcpupin mytestvm
VCPU: CPU Affinity
———————————-
0: 0-15
1: 0-15
2: 0-15
3: 0-15

# virsh vcpupin mytestvm 0 0,1,2,3,8,9,10,11
# virsh vcpupin mytestvm 1 0,1,2,3,8,9,10,11
# virsh vcpupin mytestvm 2 0,1,2,3,8,9,10,11
# virsh vcpupin mytestvm 3 0,1,2,3,8,9,10,11

# virsh vcpupin mytestvm
VCPU: CPU Affinity
———————————-
0: 0-3,8-11
1: 0-3,8-11
2: 0-3,8-11
3: 0-3,8-11

Source

According to their guide that would tie the VM’s vcpu 0-3 to the physical CPU threads 0-3 & 8-11.

If you had a simpler method in mind then all the better. Since I’m not using the top slot for the GPU I’ve passed though I’m going to make the assumption it’s running on NUMA Node 2 which should be Core 8-15 or thread 8-15 & 24-31 (according to lstopo).

wendell · July 30, 2019, 7:27pm

ah, yes, in the bios amd/cbs you can set interleave to something else and it will break up the nodes. Linux for the host OS separate nodes works great. On windows this is the setup I’d recommend, UMA.

IT’s the memory interleave bios option that controls this. Try some different settings.

Novastark · July 30, 2019, 7:38pm

Glad you got it worked out. Dunno if I’ll get to it myself today. Today’s one of those days - plumping to fix. I’m keeping track of what I do in a text file so when I’m done I might post it so you all can laugh and point at my newbness.

Windows7ge · July 30, 2019, 8:13pm

Hey I’m fairly new to this too (with GPU pass-though at least) but wendell’s guide is pretty thorough in holding your hand though the process (unlike A LOT of online guides that expect you to have a heap of prior knowledge). You’ll get it.

That little bit of information saved me likely days of head scratching and Googling. So I took that and found the setting like you stated then Googled which setting would be necessary/optimal. A Reddit post discussing Memory Interleave and NUMA Nodes stated Channel is what I wanted and sure enough back in Linux (running natively) both lscpu and lstopo now display NUMA Node 0&1.

While I was in the BIOS I did accidently stumbe accross an option just called IOMMU with Enable | Auto as options. I enabled it while I was in there (may or may not have made a change).

Checking lstopo my guess that I’d want threads 8-15 & 24-31 was actually correct and this explains why enabling Enumerate all IOMMU in IVRS was also necessary since NUMA Node 1(or 2) was in control of the GPU so that BIOS setting was stopping that NUMA node from passing IOMMU requests or groups (not actually sure what they’re called).

Windows7ge · July 30, 2019, 11:14pm

So it would appear using virsh vcpupin to pin the CPU cores to the VM isn’t ideal for two reasons.

The VM needs to be running prior to running virsh vcpupin
When the VM is turned off and back on CPU assignment returns to using all CPU cores.

[EDIT]
So it seems versions of virt-manager do come with a CPU Pinning option under CPU/Processor. NOT IN THE VERSION UBUNTU DECIDED TO DOWNLOAD FOR ME! It stops at Topology. Nothing is beneath that.

So here’s the fix I found here edit the .XML file using virsh edit and find the line saying <vcpu placement='static'>#</vcpu>. You can define what physical cores/threads get assigned to the group of vcpu’s by appending cpuset='#-#' onto the end of 'static'. I don’t know what purpose the static value serves it may or may not be important. If you need to assign say threads 0-3 and 8-11 to the VM you can edit cpuset to look like this: cpuset='0-3,8-11'

This is what I wrote in my .XML file:
<vcpu placement='static' cpuset='8-15,24-31'>16</vcpu>

This is the output of virsh vcpupin:

 VCPU   CPU Affinity
----------------------
 0      8-15,24-31
 1      8-15,24-31
 2      8-15,24-31
 3      8-15,24-31
 4      8-15,24-31
 5      8-15,24-31
 6      8-15,24-31
 7      8-15,24-31
 8      8-15,24-31
 9      8-15,24-31
 10     8-15,24-31
 11     8-15,24-31
 12     8-15,24-31
 13     8-15,24-31
 14     8-15,24-31
 15     8-15,24-31

Now regardless of rebooting the VM the CPU assignment remains the same. wendell if you don’t see anything wrong with this it might be worth adding for those who don’t have the GUI menu option.

Windows7ge · July 31, 2019, 10:15pm

I am so close to having this working. Using what seems like a or the official Looking Glass website I’ve got Windows running though it and it is quite smooth (a glitch here or there but not bad). Unfortunately it’s not accepting any keyboard or mouse inputs…still troubleshooting this.

It seems the looking-glass-host.exe tool that runs on the client will try to use the Windows Basic Display Adapter (which runs as part of the QXL video output on virt-manager). The only fix I found was to disable it and as a result the video feed of your VM will stop working in virt-manager. After that though it lets Looking Glass talk to the VM just fine.

I know the NVIDIA fanclub is pretty huge so most won’t run into the following issue but for me just being someone who grew up on AMD graphics that’s what I’ve continued to use today:

AMD GPU’s later than the Radeon HD series have had a nasty little feature built in where if no display is plugged into the GPU it shuts off the output. For an application like this that’s a problem. (I should know - Looking Glass kept dying every time i unplugged my monitor from the Windows VM GPU) I get the feeling the dummy plug/resistor cheat I’ve heard about crypto-miners using would fix this issue fine but what also appears to work is just plugging the GPU into a port on a monitor. It looks like it doesn’t have to be the active input but just being plugged into a display at all keeps the video out running. This by extension keeps Looking Glass going.

In the meantime while I try to diagnose Looking Glass not accepting K/M inputs the virt-manager KVM (even though I killed Video out) does still accept them so I can full screen virt-manager and run Looking Glass on my second display.

1/2 decent temporary fix I’d say.

Novastark · August 1, 2019, 12:48am

Yeah, that’s what is typically done once you’ve verified the passed GPU is working. Personally I’m using a second monitor for the Win10 so I don’t have the same problem (RX580), but one of those dummy/headless adapters would work if you didn’t want to spare the monitor/extra input (Nothing says you can’t have the GPU for the VM connected to the host’s monitor if it has a spare input).

As far as the kb/mouse I passed thru a dedicated pair. Mind you it makes a little more sense for me as I have a second screen to put them in front of. Using looking glass you might need to use something like synergy? Without the virtual graphics and spice server I don’t think those spice based redirectors will work?

Actually wendell mentions synergy at the beginning of the guide when mentioning looking glass. Well, not beginning it’s under the heading Configuring the Virtual Machine.

Windows7ge · August 1, 2019, 2:38am

Novastark:

Yeah, that’s what is typically done once you’ve verified the passed GPU is working. Personally I’m using a second monitor for the Win10 so I don’t have the same problem (RX580), but one of those dummy/headless adapters would work if you didn’t want to spare the monitor/extra input (Nothing says you can’t have the GPU for the VM connected to the host’s monitor if it has a spare input).

As far as the kb/mouse I passed thru a dedicated pair. Mind you it makes a little more sense for me as I have a second screen to put them in front of. Using looking glass you might need to use something like synergy? Without the virtual graphics and spice server I don’t think those spice based redirectors will work?

Actually wendell mentions synergy at the beginning of the guide when mentioning looking glass. Well, not beginning it’s under the heading Configuring the Virtual Machine.

I want the two of my displays to work for Linux so I can’t just dedicate one to Windows but I can sacrifice a port on one of them just to keep the signal running.

I have a little 2 computer KVM switch so it wouldn’t be out of the question for me to go that route. Have to figure out how to pass-though USB ports though. There’s a flag when lauching Looking Glass that will either leave Spice enabled or disable it. Although keyboard input works fine this way mouse control is non-existent. If you click somewhere the mouse will just teleport each time you click. It doesn’t actually move as you move the mouse.

It looks like Synergy costs money so that’s not going to be my first choice. So, KVM switch is the way to go for me I guess. I’ll have to figure out USB pass-though.

Novastark · August 1, 2019, 2:52am

Sorry, I know how I managed it but was 10x harder then it needed to be because I came across a script somewhere that made it easy to identify your ports to be added to the VM, but I seemed to have misplaced the link.

Nvm, found it -
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#USB_controller

Windows7ge · August 1, 2019, 3:37am

That made pretty quick work of what I thought would be a lot more work. I wanted to pass-though a USB2.0 controller by just connecting a internal header to a couple of USB type A ports.

Running that really long command however showed that it was entangled in a mass of other IOMMU devices all in the same group. What was isolated in it’s own group was a USB3.0 controller. One I wanted to keep for the host. So trying to pass-though the 2.0 controller just resulted in a heap of errors while giving in and giving the 3.0 controller to the guest met no resistance at all. It started right up and my KVM switch is able to go in-between the host and the guest.

So that’s it. It’s all setup and working. VM, GPU-passthough, Looking Glass, and a KM solution.

I’ll run this setup for a few weeks or months and post back about how well or lack their of it has ran.

Novastark · August 1, 2019, 6:32am

Yeah I plan to do the same for at least a week before flashing my bios with support for Ryzen 3000 so I can test if anything related to VFIO was broken by MSI in the update. Before that I need to get everything tuned to see if there was any mistakes in my plan. Unfortunately I’m worried about I/O. It’s a lot better since I got the Virtio drivers loaded but there were still some latency hiccups that was causing games to stutter. Well a game. I only downloaded Fallout 4 because I remember that having some pretty bad load times. But after running it once I can’t get it to load again. Dunno if this is some typical bethesda goodness or if it’s something specific with the Win10 install in the VM. Well I’ll get my CPU’s pinned and hugepages enabled before I go down that rabbit hole.

edit Well I got hugepages and cpu pinning sorted, probably. For some reason I couldn’t set the suggest advanced settings on my disk devices. However, I might have noticed a small (big) mistake in my VM config that might (totally) have been the problem with performance. oO; Oops.

Novastark · August 1, 2019, 9:45am

@wendell
I’m wondering about vm.hugetlb_shm_group=48. The two links you provided about hugepages don’t explain it, but looking here -
https://wiki.debian.org/Hugepages
Suggests that setting it isn’t necessary for libvirt. The ubuntu wiki makes no mention of setting up a filesystem in fstab but the blogpost does. However the later makes no use of permissions in fstab despite setting a group with vm.hugetlb_shm_group.

I feel like these might be unnecessary steps and you did say something along the lines of being surprised it worked without having to do any extra once you enabled hugepage’s themselves. I might be misunderstanding something. Any ideas?

edit Well, I tried it without vm.hugetlb_shm_group=48 or the fstab bit and it seems to work fine (you did say the later was optional I see now). Dunno if there is any security issues with the former though.

Windows7ge · August 1, 2019, 1:40pm

Novastark:

Yeah I plan to do the same for at least a week before flashing my bios with support for Ryzen 3000 so I can test if anything related to VFIO was broken by MSI in the update. Before that I need to get everything tuned to see if there was any mistakes in my plan. Unfortunately I’m worried about I/O. It’s a lot better since I got the Virtio drivers loaded but there were still some latency hiccups that was causing games to stutter. Well a game. I only downloaded Fallout 4 because I remember that having some pretty bad load times. But after running it once I can’t get it to load again. Dunno if this is some typical bethesda goodness or if it’s something specific with the Win10 install in the VM. Well I’ll get my CPU’s pinned and hugepages enabled before I go down that rabbit hole.

I don’t plan on gaming on this setup. I don’t know how stable the frame-rate will be for my hardware/software config. I just want something that runs smoothly for my 3D design applications.

I expect with USB controller pass-though the I/O latency shouldn’t be bad at all for keyboard/mouse input. I would imagine programs like Synergy rely heavily on the reliability of your network which would be subject to connection issues or packet loss.

If you mean disk I/O I also installed the virtio driver. Using Crystal Disk Mark raw performance was improved significantly after switching to this from an emulated disk. I have noticed that the disk responsiveness when installing things seem slower though so I think IOPS were impacted significantly. Explains why if possible we should have passed though an entire disk.

Novastark · August 1, 2019, 2:53pm

Oops, yeah. Meant disk. I had the virtio drivers loaded for both disk and network. Now that I fixed my other problem I’m not getting the stutter I had before loading games. At least in brief testing. Still bugs me that I can’t set my IO Mode as I wouldn’t be surprised if I had some issues on the write side due to using thin provisioning. Time will tell.

I haven’t done any benchmarking yet. Hopefully I can get to that today. My qcow2 is sitting on a zfs pool with 2 mirror sets so it shouldn’t be horrible. But it won’t be anywhere near the IOPS of a NVME. But I’m fine with that if I can have my snapshots and redundancy. I know I could do a NVME mirror but I decided I didn’t want to spend that kind of money for the capacity I wanted. I could still end up creating another pool with a couple cheap 512Gb NVME for especially disk heavy games, but I think I might be satisfied with this. In my brief testing last night I noticed the impact of having around ~8 Gb of read cache back on the host when I did repeated loading of the same area. Of course it made me want to upgrade to 64 Gb of RAM but first I gotta prove this setup actually works for me before I spend more.

I had planned on passing through a USB controller like you have done, but wasn’t able to due to IOMMU groups. So far passing the individual USB devices seems to work well enough. We’ll see if I have any complaints while gaming later. I doubt I will tho. I honestly don’t have the reflexes I used to and don’t play many online multiplayer games where any sort of input lag would be noticeable.

Windows7ge · August 1, 2019, 9:37pm

Novastark:

I haven’t done any benchmarking yet. Hopefully I can get to that today. My qcow2 is sitting on a zfs pool with 2 mirror sets so it shouldn’t be horrible. But it won’t be anywhere near the IOPS of a NVME. But I’m fine with that if I can have my snapshots and redundancy. I know I could do a NVME mirror but I decided I didn’t want to spend that kind of money for the capacity I wanted. I could still end up creating another pool with a couple cheap 512Gb NVME for especially disk heavy games, but I think I might be satisfied with this. In my brief testing last night I noticed the impact of having around ~8 Gb of read cache back on the host when I did repeated loading of the same area. Of course it made me want to upgrade to 64 Gb of RAM but first I gotta prove this setup actually works for me before I spend more.

I had planned on passing through a USB controller like you have done, but wasn’t able to due to IOMMU groups. So far passing the individual USB devices seems to work well enough. We’ll see if I have any complaints while gaming later. I doubt I will tho. I honestly don’t have the reflexes I used to and don’t play many online multiplayer games where any sort of input lag would be noticeable.

My virtio disk is sitting on my boot drive using ext4. It’s a 1TB NVMe m.2 drive. Using the virtio driver as oppose to virtualizing the disk I’m getting 6GB/s reads & 2.5GB/s writes within the VM. Somehow the VM has to be using RAM as a cache because the NVMe drive is not capable of those read speeds. This paired with the SFP+ NIC bridge I can have both Windows and Linux talking to my file server at full 10Gig simultaneously. (dual port NIC)

The groups screwed me over too for my specific motherboard. However if you were to install a USB AIC it is my assumption that it would appear in it’s own group and you could pass it though with ease. I went the controller route because I need one keyboard/mouse for the host & guest. I just happened to have a 2 computer KVM switch so all I have to do is press a button and I can move in between the two.

Novastark · August 1, 2019, 9:43pm

AIC? (Ohh… duh… Add-in Card?) If you mean a separate PCIe card I have one. The problem is that none of my available PCIe slots are isolated on it’s own IOMMU. With two double slot GPU’s 4 of 6 slots are used/blocked and the last two get grouped with a bunch of chipset devices unfortunately. Or do you mean a USB hub or something?

Well, just did three passes of crystal mark on my two virtio drives sitting on ZFS. Reads aren’t bad averaging a typical nvme ~3500 Mb/s to ~3900 Mb/s. Writes, well, can’t compare to nvme. Especially since they’re mirrors so writes aren’t the strong point. Fortunately not many games need fast disk writes. 350 Mb/s to 450 Mb/s. I have really no idea if it’s good or bad. I just pulled up some nvme benchmarks to compare with. /shrug. I might be able to do better if I can figure out how to set the caching and iomode but virt-manager throws an error if I start with anything but default.

Hmm… dunno how I got a 3300 Mb/s write speed on a 4th test. That’s definitely some weird outlier.