GPU passthrough working but still disappointing performance

Back to playing with this stuff after a long break.

I have a Dell R720 with Promox running, and I’d like to have a Windows VM running there for the (thankfully few) games I have that won’t run well on Linux. The GPU I have installed is a Quadro K4000, which I don’t think is all that great but I figure should at least give me reasonable performance.

I think the thing is “working” - when I run GPU-Z without the PCI card passed through, I get no information. GPU-Z with the card passed through and virtual display disabled, I get back Quadro K4000 hardware info. I don’t have that “Error 43” or any other error reported in Device Manager after installing the drivers and setting the card as the primary GPU with pci-express enabled in Proxmox. So all appears well as far as I can tell. Videos play fine.

But it still sucks!

I haven’t bothered testing any actual game since just scrolling random webpages, such as this one, is far too sluggish and nowhere near native performance.

So what could be the bottleneck here? I was hoping for “near native” performance suitable for running some games with rather low graphics demands.

I’m accessing the VM over the network from a Linux PC via Remmina, is that a major factor? Do I need to look at using something like Looking Glass?

Maybe it’s the CPU? When I scroll in a browser, the resource that stands out is generally CPU. This Dell box has an idiotic number of cores - 32 (the only other thing I’m running is an idle Nextcloud VM):

32 x Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz (2 Sockets)

I’m giving the VM 4 of these cores - is this CPU just awful for desktop computing? Did I miss some optimization? Task Manager reports the correct CPU… I even tried bumping the CPUs up to 10 just to see if it helps (as expected, it did not).

Here’s the Proxmox configuration I’m using:

args: -device virtio-mouse-pci -device virtio-keyboard-pci
agent: 1
audio0: device=AC97,driver=spice
bios: ovmf
boot: order=ide2;scsi0
cores: 4
cpu: host,flags=+pcid
cpuunits: 10000
efidisk0: local-lvm:base-101-disk-1,size=4M
hostpci0: 0000:42:00,pcie=1,x-vga=1
ide2: local:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K
machine: pc-q35-5.2
memory: 16128
name: win10vm
net0: virtio=62:CB:62:83:01:04,bridge=vmbr0
numa: 1
ostype: win10
scsi0: local-lvm:base-101-disk-0,backup=0,cache=writeback,iothread=1,replicate=0,size=55G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=a1b54fdc-b5ce-4d35-9627-7404aed1aca5
sockets: 1
template: 1
vga: none
vmgenid: 8c129392-09b9-4804-b77a-9be8103d2622

Well it’s like giving the CPU 4 E(efficiency) cores from a modern Intel 12th or 13th gen. Don’t expect miracles.

Ha, I didn’t realize the E meant that (or even think about the E). So you think it’s really just crummy single threaded CPU performance?

Yeah. These are Sandy bridge Cores from 2012, so ~10 year old. They still work for stuff that scales well across cores (typical server stuff, 16 cores were great in 2012), but games want single thread performance.

I’m not saying it’s only the CPU, but this seems like the most obvious problem.

And CPU pinning can reduce latencies. Proxmox has a GUI to do this in recent versions. This pins individual cores to the VM, without using other cores. That CPU hopping is usually bad for latency-sensitive stuff like games. This is the Linux way of how CPU affinity works on windows.

I don’t doubt it too much. I came here with that as the top suspect, myself. Although I’m not sure about the remote desktop performance hit either - it doesn’t seem to impact video at all which leads me to think it’s not the Remmina connection.

Still it’s disappointing these things can’t even render a modern webpage properly. :confused: (Maybe that says more about the state of the modern Internet than anything…)

So maybe we’ll just keep loading it up with the other “server stuff” we have planned and I’ll need a 2nd Proxmox node for “high end” desktop computing…

Other insights or optimization suggestions welcome.

Remote desktop connection are always laggy, some more than others. This is usually fine for most stuff, but as soon as fluid movements, videos or games are on the table, things aren’t perceived as satisfying. That’s why we plug our monitor into the GPU and not into our network switch :wink: Bandwidth and latency.

But best is to not remote into it and plug the HDMI cable in a KVM switch between machine with native connection. I’ve seen people running 20m HDMI cables across their house to do just that :wink:

If you can manage to test the “lagginess” with direct connection to the servers GPU, you know what is causing trouble.

I was running a “gaming VM” on my homeserver last year. I used Keyboard+mouse USB passthrough + HDMI out on the server. Was butter smooth desktop experience. I tweaked some CPU pinning stuff, but otherwise 98% gaming performance as advertised.

Oh I’m not above running wires at all, but I’m trying to get even better convenience than a KVM provides. (Plus I aleady have the KVM setup. :rofl: ) I just rarely boot to Windows on the other box, since it’s just for work running Linux now.

Subconciously maybe I’m just doing this for the nerd factor or the fun of it. I’m going for maximum convenience, by essentially making the Windows VM act like just another “program” window. Plus I don’t think any KVM is going to beat an RDP window for back and forth switching speed.

Also MS Windows does not deserve to run on bare metal in my house and I feel dirty whenever I select the damn thing in a GRUB menu. I want MS out of my life as much as possible!

Err anyway… the type of “games” I’m talking about aren’t your usual Call of Duty or whatever competitive gaming stuff. For Star Citizen or something, yeah you want bare metal. But I mostly do lower paced war strategy or nerd stuff that I think I could live with a bit of latency on.

Yes I should probably try a direct connection just to see what it impacts.

I wonder, is it worth trying a CPU upgrade?

It doesn’t seem like LGA2011 offers a whole lot more, going by benchmarks.

The 2.90ghz is what I have now and I doubt a 20% better single thread performance will do much.

https://www.cpubenchmark.net/compare/2154vs2009vs1223vs2120/Intel-Xeon-E5-2667-v2-vs-Intel-Xeon-E5-2697-v2-vs-Intel-Xeon-E5-2690-vs-Intel-Xeon-E5-1660-v2

(Also I was mistaken about the number of cores I have currently, it’s just 16 across the two CPUs.)

Okay, surprising result on the direct connect monitor test. Turns out the CPU seems to be fine, it appears it was the remote desktop causing the lag.

I guess I need to learn more about Looking Glass or look into other remote access solutions, maybe that would help…

I’ve yet to see any remote desktop solution that can handle running games (even slower strategy ones) particularly well. Could something like Steam streaming from the Windows VM be an option?

I was thinking of not putting Steam on there at all and just copying what I need over, but that is a thought to add to the investigation list.

Since you use proxmox, I’ll leave this here, it’s the most up to date and comprehensive guide for how to fall down the rabbit hole of resource pinning and isolation in proxmox that I know of: [TUTORIAL] - Hey Proxmox & Community - Let's talk about resources isolation | Proxmox Support Forum

1 Like

Nice link, will come in useful once I’m done with the upgrade to 7.4!

Can you pls Test it with an quick steam link or parsec? I think thats your Problem.

1 Like

If you mean the remote Remmina/RDP part yes I think so, I was able to take that out as a factor with a direct connection and the lag was virtually gone.