To get it out of the way first. My main PC is purpose built to do VFIO GPU passthrough for playing games in windows.
32 gb DDR4 3200 ram
RX Vega 56 Guest GPU
RX 580 4gb Host GPU
MSI B350M Mortar motherboard
The idea with this build was to create a dual head ready gaming rig that could if ever needed be used to game by two people at the same time. Of course most of the time I use my VM to game and the linux host to browse the internet and multitask a little. The seperation of the two has a lot of benefits as it eliminates a lot of the downsides of multi monitor setups.
Lately I’ve been playing and streaming some heavily modded minecraft. I did some prelim testing and found I was getting much better performance in linux than windows for this. So I went ahead and have been doing my minecraft streaming all in linux. I soon found I was at the limits of my ram as I reserve half for the guest at startup in my typical setup use case. So I set up an alternate boot entry that remove removes all of my vfio kernel flags. This came with the added benefit that my Vega was no longer being held in reserve by VFIO-PCI.
Once I hit this point I figured I could look into leveraging the additional power of my vega to get even better performance. And I quickly found that doing so would be weird. I can easily use my 580 and vega to drive 2 monitors in the same X session. what I was not expecting was how linux handles the render load. because my 580 is the boot gpu all accelerated work goes to it by default. so if I launch minecraft as normal and boot it on the display being run by my vega it runs awful. and the moment I drag it back to the display powered by the 580 it’s fine.
Investigating this issue quickly led me to the env variable DRI_PRIME. using this variable when launching an application I can specifiy which gpu I want to do the work for rendering that application. So I’ve tested this out with minecraft and Unigine superposition. what I found was when using the vega gpu to do the render work I get the same perfomance no matter which display it’s going to. but the 580 struggles if it’s not on it’s native display. I think part of this disparity can be attributed to PCIE bandwidth. the Vega GPU is on a PCIE 3.0 16X slot that is full speed. whereas my 580 is on a PCIE 2.0 4X chipset slot. so while the 580 does fine to play games in this configuration it doesn’t have the spare bandwidth to the pass that data on to the vega.
There is still something I am curious about. I wonder how the pipeline looks when using DRI_PRIME to make the vega do the compute work. the functionality here is very similar to nvidia optimus, where the compute data is passed back to the display gpu. The way I understand it. the 580 is actually the one rendering the whole desktop on both monitors since it is one X session. only after it does the render work it sends some of the display data to the vega to output. So if I use DRI_PRIME for an application, when something is displayed on the 580 monitor the compute is done by the vega which sends the information to the 580 to output. but if I am understanding this right that means that when its rendering something for it’s own display it has to then send that data to the 580 who then does the full desktop composition and sends the final data back to the vega. Am I right here or does it work in a smarter way and get to put that workload out directly?
Edit: I had one more thought. If the gpus do in fact play hot potato with their output like I described at the end I wonder how much latency is introduced.