I wrote something on ACS, IOMMU and IOVA here, if you’d like to read the “conceptual discussion” segment of the OP, it will likely answer your question.
Intro
In this installation we are going to be discussing the technology behind PCI Passthrough to VMs. The concept of passthrough is relatively simple. You take a physical device and forward it's memory registers to the VM. A simple idea, however, doesn't make an implementation simple. There's a lot that goes into passthrough, and a bunch of extremely talented people have put a lot of time into software to bring passthrough to the point where it's a lot easier than it was before, but still not quite plug 'n' play.
Conceptual Discussion
Now, what goes into it exactly? To give a brief overview, we've got the hardware support for passthrough, the IOMMU or Input-Output Memory Management Unit, supported by both the motherboard and CPU. (more info on that here ) The Linux driver, VFIO, is assigned to the device at boot, preventing the device from being initialized. This will help us when it comes to passing our GPU into the vm. If we've got the GPU bound to another driver, we won't achieve successful passthrough, because you won't be able to exclusively lock the GPU's resources to the QEMU vm.
Now, let's talk about the PCIe bus. The machine I'm going to be using as a reference has an ASUS Z170 -a and a 6700k . This gives me 16 PCIe lanes on the CPU to play with. Most GPU's will be happy with 8 lanes, so we shouldn't have bandwith issues here. I am going to be passing two devices to my VM: GPU and USB-3 controller. The passthrough GPU will be using an 8x connection on the PCH and the USB controller will be using a 4x connection on the CPU. This will allow the GPU I'm using for Linux output to be connected by an 8x connection as well.
Handling PCIe passthrough isn't 100% straightforward. There are limitations and rules about how the IOMMU sees a device and its IO Virtual Addresses (IOVA). Some devices will alias to the same IOVA space which makes the IOMMU unable to destinguish between the two devices. This becomes problematic when dealing with tr…
As far as your “From what I understand”, you’re pretty much spot on with the exception of the mention of a dumb switch. Certain smart switches can use ARP to detect where a device is and not make the packets go all the way back to the router (root port). Apply this thinking to the PCIe fabric.
3 Likes