An Exploration in to leveraging multiple GPUs in LInux

To get it out of the way first. My main PC is purpose built to do VFIO GPU passthrough for playing games in windows.
R7 2700X
32 gb DDR4 3200 ram
RX Vega 56 Guest GPU
RX 580 4gb Host GPU
MSI B350M Mortar motherboard

The idea with this build was to create a dual head ready gaming rig that could if ever needed be used to game by two people at the same time. Of course most of the time I use my VM to game and the linux host to browse the internet and multitask a little. The seperation of the two has a lot of benefits as it eliminates a lot of the downsides of multi monitor setups.

Lately I’ve been playing and streaming some heavily modded minecraft. I did some prelim testing and found I was getting much better performance in linux than windows for this. So I went ahead and have been doing my minecraft streaming all in linux. I soon found I was at the limits of my ram as I reserve half for the guest at startup in my typical setup use case. So I set up an alternate boot entry that remove removes all of my vfio kernel flags. This came with the added benefit that my Vega was no longer being held in reserve by VFIO-PCI.

Once I hit this point I figured I could look into leveraging the additional power of my vega to get even better performance. And I quickly found that doing so would be weird. I can easily use my 580 and vega to drive 2 monitors in the same X session. what I was not expecting was how linux handles the render load. because my 580 is the boot gpu all accelerated work goes to it by default. so if I launch minecraft as normal and boot it on the display being run by my vega it runs awful. and the moment I drag it back to the display powered by the 580 it’s fine.

Investigating this issue quickly led me to the env variable DRI_PRIME. using this variable when launching an application I can specifiy which gpu I want to do the work for rendering that application. So I’ve tested this out with minecraft and Unigine superposition. what I found was when using the vega gpu to do the render work I get the same perfomance no matter which display it’s going to. but the 580 struggles if it’s not on it’s native display. I think part of this disparity can be attributed to PCIE bandwidth. the Vega GPU is on a PCIE 3.0 16X slot that is full speed. whereas my 580 is on a PCIE 2.0 4X chipset slot. so while the 580 does fine to play games in this configuration it doesn’t have the spare bandwidth to the pass that data on to the vega.

There is still something I am curious about. I wonder how the pipeline looks when using DRI_PRIME to make the vega do the compute work. the functionality here is very similar to nvidia optimus, where the compute data is passed back to the display gpu. The way I understand it. the 580 is actually the one rendering the whole desktop on both monitors since it is one X session. only after it does the render work it sends some of the display data to the vega to output. So if I use DRI_PRIME for an application, when something is displayed on the 580 monitor the compute is done by the vega which sends the information to the 580 to output. but if I am understanding this right that means that when its rendering something for it’s own display it has to then send that data to the 580 who then does the full desktop composition and sends the final data back to the vega. Am I right here or does it work in a smarter way and get to put that workload out directly?

Edit: I had one more thought. If the gpus do in fact play hot potato with their output like I described at the end I wonder how much latency is introduced.

1 Like

What you’re describing does not seem far fetched. Though it would probably require a comment from someone who knows xorg by heart. Keep in mind that xorg is quite dated (not in a good way) and there is significant movement in some Linux distro’s to phase it out. Wayland (aka Waynotgonnaland at one point) has actually landed in Fedora for example. That promised a lot of streamlining of the api, and might have eliminated this problem. Maybe it’s worth exploring this on Wayland as well.

I actually haven’t tried wayland on KDE yet. I just installed it and I’m going to give it a go.

Well I’ve played around with wayland a bit DRI_PRIME still works as expected I did notice that i get slightly improved scores in superposition as well. however I can’t test out the mixed gpu display output path as that doesn’t appear to be available using wayland in KDE at least. I do note that desktop performance and refresh rates are a lot cleaner.