Best way to run Windows VM w/ GPU accelleration in Linux as of now?

I’ve been using KVM on a home server for some time now (running freeNAS and PfSense on a CentOS host) and quite happy with it. I want to explore deploying a windows VM on my desktop now, with the intent to play “games” on it since unfortunately many of the games I like are from Ubisoft.

Current hardware:
motherboard: x570 aorus elite
cpu: Ryzen 3700x (may consider upgrading to 5000 series given pretty much nearly any excuse reason :slight_smile: )
gpu: Nvidia 2070 Super (big navi is looking good, might consider flipping it if/once linux driver support is mature)
NIC: HP/qLogic 1/10G SFP+ card
RAM: 4x8GB = 32GB RAM
drives: 1.2TB Intel U.2 (nvme, connected to top M.2 slot) , 512GB M.2(sata) w/ Windows install on bottom slot), misc other SATA SSD/HHD’s

Ideally I’d have a dedicated GPU and USB controller for guest and host. However I don’t have enough physical slots for another GPU. I originally built my system with the intent of doing this later, and later is now :smiley:

Do there exist standalone PCIe switches that can take a single PCIe gen4 x16 slot and “split” this into 2 PCIe gen 3 x16 slots? or convert PCIe gen4 x8 into PCIe gen3 x16 to make optimal use of available bandwidth? This solution would complicate my choices for a PC case (may have to fabricate a custom one) but that’s fine. Or perhaps a motherboard with the hardware already integrated that has this capability?

On mobile right now, bit let me give you a quick rundown. I can elaborate where you need it later.

You’ve got solid hardware. 32gb is kinda the midrange of what I’d recommend. You can squeak by with 16gb but 32 makes it comfortable.

Now, you’ll need another gpu. You can get a bifurcated riser for gen 3 16x to gen 3 x8 x8. That’s my recommendation and is actually what im in the process of doing with an itx sandwich build I’ve got right now :smiley:

No, there isn’t. Well, the x570 chipset can do that sort of thing,iirc, but outside of that it really doesn’t exist. But really, you won’t need more than 8x. 2080tis are known to not saturate a gen 3 8x connection, so even if you do bottleneck, it will only be slight and not really be noticeable.

Something you might consider is switching to an apu, like the 4750g. It’s a really solid contender and is also an 8 core. This will eliminate your need for a second discrete you and will allow you to have decent acceleration on Linux. Wendell published a review of that chip on level1.

1 Like

Thanks for the detailed reply!

Keep in mind that (as far as I’m aware) my top PCIe slot is 8x as of now.

This is because I have a 10 gigabit NIC installed. If I install a bifurcated riser my assumption is only one slot will work. Or nothing will? not sure. It’s why I specifically mentioned a PCIe switch, similar to a network switch. I think Linus used those on his ridiculous over the top “X gamers 1 CPU” builds. (that I’m jealous of lol)

Although thinking about it more, I might be able to avoid a very expensive active switch/mux by moving my sata M.2 SSD over to a SATA cable via an adapter I have, then move the U.2 SSD down to the slower chipset-connected M.2 slot, then finally use my top PCIe x4 slot a new AMD gpu (5700 or newer) with PCIe gen4 x4? perhaps? should be equivalent to PCIe3 x8 if it actually works.

if that pans out it, then it might be the cheapest solution, not sure. The biggest open question I have I suppose would be which GPU should be passed through. On one hand nvidia does not play nice with linux, but then again passing it through is also tricky due Nvidia’s attempts to prevent this.

Nope, your top slot is 16x.

If your motherboard supports bifurcation (there would be an option in the bios), it’ll work, otherwise it won’t.

I’m fairly confident that crazy 8 editors 1 cpu rig (or whatever) was using bifurcation, but I’m not positive. I’d have to get more details.

More or less.

I would recommend moving your 10g nic to the 4x slot coming off the chipset. Then you can bifurcate (if available) the top (CPU) 16x slot and run the 2 gpus off that.

Depends on your needs. I mentioned the APU above. If you only need light acceleration on the Linux side, that’s the way to go. Otherwise, we’ll need to have the “which to pass through” discussion when you’re ready to pull the trigger on a new GPU.

I have the Nvidia 2080 Super installed on the top slot, but the other x16 slot is populated with a HP / Qlogic [HP NC523SFP] dual-port 1/10Gbase SFP+ card, from which i have a fiber and copper run over to my 10Gbps managed switch.

Do you mean to tell me that with this configuration and bifurcation enabled in BIOS, that I actually STILL have 16 PCIe bandwidth on my top slot even with the Qlogic card taking 8 more lanes (electrically) on the next PCIe x16 (physical) slot?

If so then I’m stoked! :smiley: I chose the perfect platform for this! But my inherent pessimism leads me to think that isn’t the case. How do I verify?

Right, that second x16 slot is electrically x4 and running off the chipset.

This is your motherboard:

================ CPU

====------------ CHIPSET


= == electrical lane
- == physical connector size

The top slot on your board is 16x electrically and physically, connected to your CPU.

The second slot is 16x physically, but only with pins for 4x electrically.

The bottom 2 x1 slots are just that. X1

The top slot is gen 4, the second slot is gen 3.

You don’t lose any lanes from the top slot, no matter what.

So let’s run through this image:

You’ve got 20 PCIE 4.0 lanes.

4 go to the chipset.
16 go to the top slot.
4 go to the M.2 connector. (M2A_SOCKET)

From the chipset:

4x gen4 lanes to the second slot.
1x gen4 lanes to the third slot
1x gen4 lanes to the fourth slot
1x gen4 lanes to SATA5
1x gen4 lanes to SATA6
4x gen4 lanes to M2B_SOCKET
4x gen4 lanes go to SATA1 through SATA4

All of the above from the chipset shares the 4x gen4 lanes from the CPU.

I hope this helps. (I really enjoy reverse-engineering boards to see exactly how they configured the chipset)


OHH :facepalm:

I remember now, for some reason I was thinking (maybe from back in the Z77 days, intel…) that you either have 1 x16 slot (electrically) or 2 x8 slots, and I tried my damnest to find a AMD mobo that did the same since I didn’t like the idea of “only” 4 PCIe lanes on the bottom. I guess I failed (will need to RTFM my specific mobo to verifiy this), but now I understand that that’s a good thing, least for what I’m trying to do.

OK then a simple bifurcation adapter card may be all I need in the end.

Next thing to sort out will be a case. Currently I have an H510. I don’t know how I could make this one work without really modifying it

Since this is a full ATX mobo, and I am using 3 PCIe cards, potentially with one of the new graphics card offerings about to come out by the end of the year, what case are you aware of that might fit my needs? (I’m thinking if the H510 has no resell value I’ll modify it to fit the cards vertically)

1 Like

What I would do if I were you is get 1x bifurcation riser and 1x standard 4x to 16x riser. From there, you modify the case. Just rotate all the PCIe slots 90 degrees so you can fit 2 gpus and your 10G nic in there and you’re good.

Obviously, this is easier said than done, but that’s what I’d start by looking into.

Yup that’s exactly what I had in mind :wink: might have to go break out the dremel, maybe 3d print or CNC out a few parts and design custom PCB as needed for the most elegant implementation :smirk:

1 Like

If you can cut the bracket just so, you might be able to just trim it down to the number of slots you need, then just weld it back in.

And then just 3d print some brackets to hold up the GPUs.

1 Like

The cost of PCIe bifurcation cards is actually substantially more than I expected. Shady eBay listings are from foreign countries and cost upwards of $100! WTF!? I expected those to cost around $30 or so at least on eBay.

And looking at the PCB it really does nothing more than send the first half of the LVDS lanes to 1 slot and the other half to another slot, and with a pretty basic looking IC that probably just “tells” the CPU how to configure the PCIe lanes for the card’s hardware implementation, with no high-speed logic, probably just I2C or SMbus.

Am I looking in the wrong places? where can I get one fast and cheap?