A journey through the rabbit hole

This is a follow-up, or the conclusion, of the story that began here and here.

tl;dr is I wanted to run three GPUs in my system, just because I can - or couldn’t. I wanted to make good use of the 4-port dual display DP 1.2 KVM from Level1techs with a single computer, hence this all started by buying a beefy powersupply. The initial plan was to have a Manjaro host, an Ubuntu guest and a Windows guest. The Windows guest is for gaming and has a 3090 in the first PCIe slot on my Gigabyte X570 Aorus Master. The Ubuntu guest was to use a 2080ti from my previous build in PCIe slot 2 and the host was using an AMD FirePro W7100 in the 3rd slot.

Into the rabbit hole.

The first thing I noticed was that the 2080ti (it’s a Gigabyte Gaming something with a beefy cooler) was actually a little bit wider than two slots - dang. I desperately wanted to use a 20series-card since they provide a USB port, which can conviniently (or not) be passed through to the VM using the card. Otherwise I wouldn’t have a USB controller in its own IOMMU group. The only one which has its own IOMMU group is assigned to the Windows guest. Turns out 30series-cards don’t have USB ports anymore …

Crazy begins.

First, I bought a 2080ti FE from Ebay, which, at least space-wise fit perfectly between the 3090 and the AMD card. But only space-wise. It got hot like a stove, even when not in use (read: VM not started) and the whole system would blow all of it’s fans. Very annoying. To add to that, the USB on the card bahaved weirdly. HID devices would somehow “disconnect”, the OS would not recognize it, but they would no longer work. I had to unplug them physically and then insert them again to make it work again. This would happen many times a day, very annoying, again.

Next, I tried a Windows guest for the 2080ti. That did manage the HID issues somewhat better. It still would stop working sometime, but Windows would recover from this after a few seconds. Less annoying, but still. And I have no use for Windows other than gaming, so I don’t need two VMS for that. I tried different distros and Kernels for the Linux guest, all with the same issues.

Crazy continues.

Ok, I thought, that USB controller from the GPU isn’t working. I need a USB add-in card. That I bought, a VIA VL805 chipset. But where do I put it? First idea was to use a PCIe 1x riser cable to use the one PCIe 1x slot that was buried under the 2080ti - not a feasible solution. Ok, what next? Single-slot GPU. I bought an AMD Radeon Pro WX 5100, 8gig, slot-powered (that powersupply didn’t seem such a smart choice at this moment, but what the heck).

I put the WX5100 in the second slot and the USB card in the little 1x slot beneath it. Beautiful. Or not.

That worked initially, but the GPU wouldn’t reset properly at a reboot or shutdown of the VM. Turns out, that is a well-known issue - dang. I tried vendor-reset module, changed the error, but didn’t help.

Starting to dig my way up again.

I tried to “swap”, well assign for boot of the host and then the VM, the two AMD cards. Hey, the W7100 is a little older than the WX5100, can’t hurt, right? But it didn’t work. Well, the next purchase was an NVIDIA Quadro P620. Funny little thing. It went straight into slot 2 and I reassigned the W7100 in the 3rd slot to the host again. Now that little Quadro gave me all sorts of weird issues. It would, for instance, not always use the same DP output, had to juggle them around after each re/boot of the VM. Annoying, again.

Light.

At some point, I guess out of desperation, I managed to end up with this configuration:

slot 1: 3090, Windows guest, worked all the time
slot 2: WX5100, Manjaro host
slot 3: Quadro P620, Ubuntu guest + x1 USB card

This works beautifully now, all three machines can coexist in peace - and quite well cooled. The WX5100 and the P620 are both PCIe powered.

Takeaways:

PPPPPP (prior planning prevents piss poor performance), though, I couldn’t have it all planned without the experience of actually trying it. It cost quite some money, and I now have a few useless components that didn’t make into the final system, but they are nearly a full computer, so maybe I’ll build one. I did go way overboard with the powersupply (initial reasoning was to use the 3090 for gaming while the 2080ti would chew on some datasets), but for what I ended up with my 750w unit would have been fine.

That’s it. I hope this helps, or at least was fun to read for someone.

2 Likes

I’ don’t quite get why it started working for you with the latest config - but great that it did.

I also had some weird behaviour with slot powered nvidia cards. For example a GTX 1050 ti. It wouldn’t post during the first boot of the VM no matter what - which drove me crazy initialy because I thought the passthrough didn’t work. I got video after Windows loaded and post remained working after rebooting. Later I switched to a RTX 2070 because I use Bitlocker on this VM and need a working post.

PCIe extender: I got an angular 90° extender with a micro USB3 (yes these do exist) connection and matching cable from a china shop / seller on Amazon. It appears to be high quality, shielded and less jank than most of the 1x PCIe extender. So I am able to use the blocked 1x PCIe slots under my 3080ti. It helps if you have vertical PCI slots in the case to put the additional cards into ;).

Desktop Linux: I switched from Manjaro as my Host OS to Proxmox. I kept getting fed up with bending a Desktop distro to my will which kept breaking my VMs, or VFIO with updates. I had lots of little issues like very high IO latency, memory latency, fps stutter in games. Workarounds included seperating memory space (huge pages) from VMs and the Host system, assigning CPU cores, editing VM config files for various fixes. Proxmox worked without a flaw. I tried it with a spare PC as host and a Windows 10 VM and a passed through GTX 970 GPU. I was surprised how smother everything performed. The only thing I had to do was blacklisting nvidia and AMD drivers (which I don’t need on the host anyway).

Late reply …

Well, the system, while initially working, started to develop erratic symptoms. It freezed multiple times a day, without warning.

After I manhandled it in inappropriate ways - apparently, I have anger issues - I had to buy a new CPU, mainboard and cooler. Thankfully no other components were damaged!

Now I am back to multiple systems, each serving their respective function. I finally decided I needed my time for work, not tinkering.

I think it was all related to PBO, the next CPU I got started to experience random crashed when sleeping, it would have just rebooted in the morning. Sure, this had PBO enabled as well.

This time, I disabled PBO instead of destroying the system. Stable again. Huh.

So maybe these rumors about Ryzen deteriorating when enabling PBO are true after all? I wonder how long it will last now with PBO disabled. If I have to get a new one I will definitely adhere to AMDs warning in the BIOS and not enable it.

Maybe it was all PBO-related? I may never find out, I just don’t have the time to shuffle everything around again.