VFIO/Passthrough in 2023 - Call to Arms

Sneak Preview :smiley:

5 Likes

I feel like the uninitiated would want something indepth explaining the nuances. At least for the types that learn that way. I tried it once on a laptop. The experience was interesting. Ive only got experience with nvidia. I passed through the 980ti once but the experience wasnt great. Not vfio or looking glasses fault. I was the idiot doing this in a hardened arch build.

But yeah this project is ultimately the coolest if not one if the coolest projects ive seen on this forum. Man to get away from dualboot would be bliss. Particularly on a laptop. My lenovo legion exposes everything. I wonder if I should try to wrangle its 3070 into submission for the task and have debian use the integrated radeon. This could go horribly wrong depending on which gpu trully controls the output to the eDP screen

Is single gpu multiple passthrough a thing yet. As in can host and vm use the same gpu or is that work still far off?

1 Like

That’s called SR-IOV. It’s a supported feature in high-end professional cards like Radeon Pro and Quadros, but people have had it working with normal consumer Nvidia cards as well. No such luck with consumer AMD cards unfortunately.

3 Likes

Hell yes. Im all in on nvidia anyways

I wonder the latest status quo on AVIC. Is it dead or someone lucky got it working in 2023? I believe it’ll be a great addition to VFIO/passthrough on AMD platforms.

The thread I linked just came up on a quick search. I think I saw an earlier and longer thread as well as a long Reddit thread by the same author. I was able to get AVIC working in Linux 5.y.x (what exactly I forgot). After a couple of Linux releases, it no longer worked.

Few folks seem to care about it on Internet. I meant isn’t it next to the greatest thing in PCIe passthrough for hyper scalers in the cloud? But seems so little discussion about it.

Been using it for about 8 months now, no issues. Some hardware doesn’t play nice with it, but it’s working for the most part.

1 Like

I am still using a dated i9 9900K with a Geforce 3080ti and a Radeon RX 6600 with Proxmox. I realise, that I am still pretty much a Linux newbie and don’t know much about the inner workings and most of the various linux CLIs commands - but I am very happy with my daily driver I build.

What didn’t work:

  • Doing vfio in a Desktop distro (Arch / Manjaro), to much latency, to many workarounds (hugepages setup, core seperation / pinning, the need to recomple kernels after a change added 2min boot time with VMs which have more than 16GB RAM) and a lot of other issues which required constant mantainance
  • passthrough of legacy hardware (Windows XP, Win7 with older non-UEFI GPU cards), seabios passthrough - my goal was to build a retro PC based on (more) modern hardware (Intel i7 8700K CPU and mainboard during that time), no HW / software or KVM solution could produce a successfull GPU passthrough > I am better off with PCem nowadays for these old games
  • HyperV nested virtualisation tanks performance in KVM

What worked:

  • 2 gamers, one PC setup using cheapish consumer hardware
  • using link files to get rid of unpredictable network interface names while moving PCie devices around (for example replacing a m.2 would isolate my Proxmox host because the network interface name wouldn’t match in the network config): NetworkInterfaceNames - Debian Wiki
  • ZFS - creating, maintaining and expanding pools is a breeze
  • passing though SATA controllers to use Windows storage spaces in a VM (workaround for a short time after importing my bare metal system back in 2021)
  • Proxmos NAS with various hardware passed through
  • no need to isolate HW with vfio-pci in Proxmox
  • Mac OS VM with GPU passthrough
  • the janky ass PCIe - USB (like) extenders which I also use to connect a 10Gbit network card to a 1x PCIe slot (hey, still faster than a vanilla 1Gbit connetion) and my USB 3.0 PCIe cards

Edit:

  • UEFI flashing a passed through LSI card (MINI-SAS) with OFW in a VM
  • Building a second storage PC with Proxmox which uses the flashed LSI controller

Here are older pics of my system

I guess this is a failure story.

If there is any takeaway I want @wendell or @gnif to have before I start my rants it is that I am going to ask on behalf of budding Linux enthusiasts to dumb it down even further, if possible, to a 1-click install or failing that, a step by step wizard tailored for people thinking about a Windows exodus.

IIRC I first encountered L1T during LTT’s first crossover episode, and that was about VFIO and Looking Glass! I remember seeing the Looking Glass page that day thinking “Okay! Lets get this thing started!” So I went to the Installation page and this greeted me on literally the first few lines:

What LG said

Installation

libvirt/QEMU configuration

This article assumes you already have a fully functional libvirt domain with PCI passthrough working.

If you use virt-manager, this guide also applies to you, since virt-manager uses libvirt as its back-end.

It assumed a lot on the abilities and information I do not have at the time. I didn’t know these words back then, and to be honest, I only have surface level understanding of what it exactly is even now: libvirt, virt-manager, QEMU, etc.

This intimidated me greatly and eventually dissuaded me at the time as I was just barely 6 months into Linux and I saw that this is waaaay over my head. I should have RTFMed more and I felt helpless, but I didn’t know what I didn’t know to move forward. Looking back, I wish there was an easy step by step wizard-like install for people like me who are experienced enough in Windows but not enough of Linux to pull my own weight.

Perhaps it doesn’t help that I have a preconceived notion of the ease of usability for VMs: VirtualBox. It was intuive and felt easy to explore on my own, vs what I didn’t know Looking Glass demanded of me.


Time and life went by and my use cases for LookingGlass has greatly diminished over time…

I don’t really play competitive online multiplayer games anymore, at least those that strictly require Windows for anti-cheat.

  • Decreased desire to play games and support the growth of the companies like EA, Activision-Blizzard, Ubisoft, Gearbox, Tencent subsidiaries etc and the like - the publisher of games that are actively hostile against Linux VMs.
  • Rampant cheating made online multiplayer competitive games less desirable.
    • Cheat software is now monetized and industrialized, guaranteeing a progressively sh*t experience for even the marginally popular online games and that only gets worse as it goes more popular.
  • A dedicated gaming PC isolated from the network seems to be a good way forward if you have the money to spare. Microsoft is also making sure that I have their telemetry in my games even if I had migrated away from their ecosystem anyway and I cant even escape my old identities to play Minecraft anonymously. At this point, how is owning a network segregated Xbox or a Playstation a lesser experience? Now I also have a PC for Adobe productivity stuff (for graphics design and layout).
  • Valve has also done a great deal of work to make Linux gaming more accessible, even the online competitive multiplayer: Planetside 2 comes to mind and I love that it works fine on Linux.
  • My reflexes aren’t the same anymore because I’m older or maybe I am just making excuses.

So I am suffering a loss of imagination as to why I need LG these days. I look to the community for inspiration when I see your cool use cases.

This is not to take away the good work @gnif has done. It is more than well appreciated and I may still have a use case for things in the future. Looking at Starfield right now

2 Likes

NVIDIA RTX 6000 ADA here. I hate it.

Apparently, passing through a consumer-grade GPU to a VM is a piece of cake, but the professional cards result in code 43. Lurking around on other forums, I’ve gathered that these professional cards can only be passed through via SR-IOV, and that requires a subscription license to do so.

So, NVIDIA’s reward for buying their most expensive graphics card is the privilege to pay more. :roll_eyes:

4 Likes

As much as we would love to make it this simple, there are an absolute ton of edge cases that prevent this becoming a reality. For example:

  • Number of CPU cores available
  • Core numbering differences depending on CPU and/or microcode versions (breaks pinning, etc)
  • Brand of CPU (AMD vs Intel)
  • Brand of GPU for Guest usage
  • iGPU vs dGPU for the host
  • Laptop users with muxed vs muxless hardware
  • Storage medium (block device, passthrough, qcow, iscsi)
  • Windows 11 TPM configuration
  • Problematic passthrough GPUs
    • Laptop dGPUs
    • AMD GPUs
  • Problematic motherboards
    • Ones that have very poor IOMMU grouping requiring ACS patching the kernel
    • Ones that just refuse to properly work for VFIO at all for unknown reasons.
  • QEMU/OVMF 64-bit bug with IVSHMEM (this is a recent issue)
  • Audio configuration (though this is easier since LG B6)

I am sure there are many things here I am missing. There is no way to simply make a single guide to rule them all, there are too many possible configurations which cause too many edge case issues to document.

2 Likes

The old method of hiding the KVM leaf and setting a random hv-vendor-id is all that is needed. You actually still need to do this anyway for consumer cards if you don’t want to cripple the custom resolution feature.

2 Likes

You may be interesting in in quickemu, which is the absolute simplest way to get wide variety of VMs (linux, windows, even macOS) up and running.
https://github.com/quickemu-project/quickemu.

A nice visual overview of how simple and quick it is:

For the reasons gnif described, it doesn’t currently handle GPU passthrough. That said, there is one guy exploring adding the functionality as a related project:
https://github.com/quickemu-project/quickemu/issues/688
https://github.com/HikariKnight/QuickPassthrough

2 Likes

For me VFIO has been working quite alright ever since I built a home server back when the very first ryzen (1700) chips hit the market.

One of my goals was to have a headless host machine. The machine has no graphics output at all for the linux host, and only passes through the Radeon R5 230 (yes, it predates even the Polaris architecture.)

This seemed to be a major pain point to many back in the day but I could make work quite easily (with MSI B350 Mortar Arctic). I had to resort to compiling the pci-stub module into the kernel to ensure that it loads before anything else can grab the GPU, but that’s pretty much it. I have successfully used this with Windows, Linux, BSD, etc. guests. If I had to specify one annoyance, it is that the very first time the host boots, the GPU stays in a powered on state (that is, it outputs a black screen, because the firmware has initialized it) until I run and power off a guest VM that knows how to “shut down” the GPU.


What worked much less satisfactorily to me was attaching… other stuff… to the guests. I found plenty of situations where not passing through an entire USB hub (and attaching individual USB devices to the VMs) would not work quite right. Xbox controllers come to mind as an example: Windows makes the connected Xbox controllers reset for some dogforsaken reason. That then detaches it from the host (and thus VM) until the controller restarts. Few moments later the host notices a new USB device and handshakes with it, it gets connected to the VM again, Windows resets the controller…

And since I do not use VMs all that frequently, pci-stub-ing all of my best (built-in) USB ports on the machine was quite a hard pill to swallow just to make a controller work. It seems that if you’re thinking of using USB devices with both host and guest systems, you should be looking at a hub<->port layout that makes sense to you (or plan for a USB add-in card.)

Nowadays, SSD makes more sense now. I would love to learn how to put 10 ssd drives in to a mainstream PC.

Still not working.
Even with this script.

I wrote down my experience here:

Something broke my PCIe passthrough.
The VFIO unbinding to be precise.
It worked before but now it doesn’t and every time I get it to work it starts breaking again.
Feels a bit like someone is behind it.
I got it to work once when I turned FireFox off, but not a second time.
The UEFI shell is missing again in my BIOS but I did not change anything except an update for something that looked like java script.

  • MSI X570 A PRO 128GiB RAM
  • RYZEN 5950X
  • AMD Radeon HD5400 passively cooled host
  • AMD Radeon RX590 Windows guest

I use looking-glas and hope that they fix AMD issue so it works without a connected screen.

When my neck is killing me I use a beamer for the ceiling.
I built a frame so it can work in any orientation.

My second system broke this year.
The motherboard or CPU seem to be broken.

If I win the giveaway please only AMD parts.

Wish me luck for the lottery so I can buy the stuff myself :wink:

I’ve had trouble with latest kernel and vfio driver, but being on Ubuntu I had to read and come up with my own solution because no one uses Ubuntu and passthrough anymore!

It’s super simple; Instead of vfio I just pci-stub the devices on kernel boot params I want to passthru and then hand them over to vfio driver.

Same issue here with the latest Ubuntu 23.04.
I would love to learn how to do that.

6.0 or 6.1 kernel “broke” the vfio-pci.ids module parameter/command-line argument. Fortunately the Ubuntu initramfs scripts are pretty smart and you can get it working again fairly easily. For each device you want to bind to vfio-pci at boot, you have to add a softdep for its default driver. update-initramfs will parse this and load vfio-pci first (and parse and bind the list of IDs before the other driver has a chance to bind it).

Here’s my /etc/modprobe.d/vfio.conf for two Samsung NVMe drives and an NVIDIA audio device on Ubuntu 23.04 with 6.4.6 kernel:

$ cat /etc/modprobe.d/vfio.conf 
softdep nvme pre: vfio-pci
softdep snd-hda-intel pre: vfio-pci
options vfio-pci disable_vga=1 ids=10de:1aef,144d:a808,144d:a80c

And just for the sake of completeness, run this command to update initramfs for all installed kernels after making any changes:

$ sudo update-initramfs -k all -u

That got everything working again. HTH.

N.B. I use driverctl for the actual GPU device itself so I can un-bind it as necessary.

2 Likes