Good benchmarks, poor gaming performance w/ RTX 4090: VFIO, Proxmox

Hi everyone!

Thanks to the power of the community, a year ago, I was able to learn, install and use Proxmox with multiple LXCs, VMs and grew my passion for homelab stuff.

*It’s not a big issue but it’s been on my mind for a while; my gaming VM has a not so good gaming performance, like it’s a 4090 so I’m still doing great but given the hardware and the bare metal performance it’s passable at best. *

The part that got me to write this post is that the synthetic benchmarks are fine, within the margin of virtualization loss and that raises more questions, so here are some informations

Hardware

  • CPU Ryzen 9 5950X 16 cores 32 threads
  • GPU Nvidia RTX 4090 FE
  • RAM 128 DDR4 3200 ECC
  • SSD Crucial 4TB NVME M.2 (raidz1 just for vms not proxmox)

Software

  • Fresh installed for purpose of benchmarks this week
  • Latest Windows 11 Update (August 2024)
  • Nvidia driver 560.81
  • Latest cinebench, 3dmark, Warzone etc…
  • Eco mode disabled, Nvidia settings to performance, no KVM hide

Proxmox VM Specs

#<div align='center'>
#
#  # Windows VM
#  
#IP = DHCP
#<br />
#Gaming VM with GPU passthrough 
#
#</div>
bios: ovmf
boot: order=ide2;ide0;scsi0
cores: 24
cpu: host
efidisk0: vms:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:0f:00,pcie=1,x-vga=1
ide0: none,media=cdrom
ide2: none,media=cdrom
machine: pc-q35-8.1
memory: 65536
meta: creation-qemu=8.1.5,ctime=1713376038
name: windows
net0: virtio=BC:24:11:F5:13:CE,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: vms:vm-100-disk-1,backup=0,cache=writeback,discard=on,iothread=1,size=1000G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=bda2f1b4-0ba4-4a02-a8ce-4b4aa1889f1a
sockets: 1
tags: vms
tpmstate0: vms:vm-100-disk-2,size=4M,version=v2.0
usb0: host=5-4.1
usb1: host=5-4.2
usb2: host=5-4.4
usb3: host=045e:0b00
vmgenid: bbfb4beb-93f5-44c0-9da8-5b1d2441ea0c

Expected Performance
In baremetal, for gaming and benchmarks I get optimum GPU utilization, falling 1 to 5 percent within other online benchmark with similar hardware

Current Performance

In benchmarks, like Cinebench, I get good GPU utilization

Same for 3dMark Nomad

But in gaming the GPU doesn’t go beyond 40 to 60 percent depending on the game (older screenshot but the point remains even with current versions)

I found 2/3 other posts that talk about this but the suggestions are often not applicable for my case

https://www.reddit.com/r/VFIO/comments/kelcqs/poor_performance_in_cyberpunk_possibly_bad_cpu/

If someone has any ideas, hints, things to try I’d be happy to experiment as long as needed!

1 Like

I’ve seen something similar (low CPU+GPU utilization in Cyberpunk despite no framerate limiter), but with libvirtd rather than Proxmox, and on Intel 13th Gen. In my case, it was primarily due to HPET being on in my configuration. Disabling HPET in libvirt alone is not enough for some reason, it wasn’t until I disable HPET in the Windows VM (and reboot) before I can see the improvement.

bcdedit /deletevalue useplatformclock 
# to revert:
# bcdedit /set useplatformclock yes

Proxmox should have HPET disabled by default when ostype is set to win11 (easiest way to check: ps ax|grep --color hpet=off) so perhaps try updating Boot Configuration Data using the above command and reboot.

Something else that I did that I wasn’t sure if it helped was enabling Hyper-V entitlements (but again, Proxmox already has this enabled by default when running Windows).

2 Likes

Try disabling core isolation and virtualisation based security. The easiest way to do it may be to just disable svm. It’s in the cpu section when using virsh,
Not sure where it is on proxmox.

I’m just waiting that wayland gets proper HDR / ICC profile support to play Cyberpunk on bare metal Linux. I can’t get it past 26 fps and I’ve tried every trick in the book. The game’s scheduling on Windows must be broken or something.

E: disable svm (AMD Hyperthreading) and Cyberpunk runs okay but I’d rather have that on for productivity tasks…

SVM=secure virtual machine
SMT=simultaneous multi threading (hyperthreading is intels marketingspeak for this)

SVM is relevant because if it is enabled inside the VM windows may rely on nested virtualisation for security features, which tanks performance in some applications, notably games.

1 Like

Proxmox is not so easy when it comes to single high speed low latency VMs, at least if you only use the options you can access via the GUI.
Check the CPU topology in your VM with coreinfo, is the cache size and allocation to the cores correct?
What is your memory latency with Aida64?

As far as I can remember, this should be in your config
args: -cpu ‘host,topoext=on’ -smp

1 Like

Oh yeah, I quickly tried disabling that, got excited by having over 200 fps on the menus and promptly posted without checking the facts :grimacing:

Thanks everyone, here are the first results

Before any changes here is the CoD MW3 Multiplayer benchmark

Here is the same benchmark after doing the following

  • Adding the args => args: -machine hpet=off -cpu 'host,topoext=on'
  • Disabling HPET inside Windows 11 (already done but just in case)
  • Restarting

It’s disappointedly the same but the bottleneck shifted to the CPU for some reason?

I did not try disabling core isolation / svm and Hyper-V is on by default on Proxmox it seems. I will report once more when I do, thanks meanwhile to @sirn @quilt @vvk and @Janos

Have you tried pinning single CCD (8c/16t) to the VM?

I had two issues,
First, I OC’d 4 sticks of ram which cause a ton of issues from drive corruption to performance issues.
Second, for whatever reason, the card didn’t like being in the 2nd pcie slot. May have been a motherboard thing, it shouldn’t have mattered but it did for proxmox/kvm.

I eventually moved to an Intel system, I wanted something with integrated graphics to free up a pcie slot.

What are you using for graphics for the host?

I got a i7-13700K, and despite all the goings on, it’s been rock solid.
I do live in daily terror and am pretty sure my resale value is shot.
If only I waited for another amd gen, I could have had integrated graphics for the host, The last AMD I had was the AMD FX-8150, so I guess I just suck at picking hardware, except for my i7 4790k, I love that cpu.

keep us posted!

1 Like

Since you’re getting a good result on synthetic, but poor performance in actual games, I would put hardware very low on the list of possible cause (in such case, it should also show up in synthetic, but that’s not the case here).

Some possible courses from here:

  • Can you try some CPU benchmark and see if you’re getting the expected numbers?
  • Checking memory latency, as Janos mentioned, is also a good idea.
  • Benchmark a clock source with TimerBench

Alright I started from a clean install once more to document some other components

VM Info

  • Latest Virtio drivers
agent: 1
bios: ovmf
boot: order=scsi0;ide0;ide2;net0
cores: 24
cpu: x86-64-v3,flags=+virt-ssbd;+amd-ssbd;+pdpe1gb
efidisk0: vms:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:0f:00,pcie=1,x-vga=1
ide0: local:iso/virtio-win-0.1.262.iso,media=cdrom,size=708140K
ide2: local:iso/Win11_23H2_English_x64v2.iso,media=cdrom,size=6653034K
machine: pc-q35-8.1
memory: 65536
meta: creation-qemu=8.1.5,ctime=1723891443
name: windows
net0: virtio=BC:24:11:FF:40:A4,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: vms:vm-100-disk-1,backup=0,cache=writeback,discard=on,iothread=1,size=1000G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=f717dd3e-1afe-489e-9420-657d485690d9
sockets: 1
tags: vms
tpmstate0: vms:vm-100-disk-2,size=4M,version=v2.0
usb0: host=5-4.1
usb1: host=5-4.4
usb2: host=5-3
vmgenid: 07d41542-bd38-477f-b713-41ae69e83496

OS / Drivers Info

  • Latest (august 16 2024) Windows 11 Pro ISO from Microsoft fully activated
  • OS Build 22631.4037 / Version 23H2
  • Zero bloatware, not connected to Microsoft, no third party cleaning tools
  • Nvidia driver 555.99 (almost latest but stable)

TimerBench

CineBench

CrystalDiskMark

Cyberpunk

Current observations

This CPU type with the flags made me gain ~35 fps on MW3 Automated Benchmark but it does not transpose to the open world fps whatsoever and from the last screenshot you can see a virtual limit for the GPU on the Cyberpunk benchmark

Thanks @sirn & @chromefinch. @vvk I will definitely try pinning and a few other tricks and report even more

Sorry for repeating myself, but try “-svm” in the CPU flags. It solved similar issues for me in games (low GPU usage – low framerate). It’s easy to try.

Sorry for missing it out, so I tried now @quilt

agent: 1
bios: ovmf
boot: order=scsi0;ide0;ide2;net0
cores: 24
cpu: x86-64-v3,flags=+virt-ssbd;+amd-ssbd;+pdpe1gb;-svm
efidisk0: vms:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:0f:00,pcie=1,x-vga=1
ide0: local:iso/virtio-win-0.1.262.iso,media=cdrom,size=708140K
ide2: local:iso/Win11_23H2_English_x64v2.iso,media=cdrom,size=6653034K
machine: pc-q35-8.1
memory: 65536
meta: creation-qemu=8.1.5,ctime=1723891443
name: windows
net0: virtio=BC:24:11:FF:40:A4,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: vms:vm-100-disk-1,backup=0,cache=writeback,discard=on,iothread=1,size=1000G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=f717dd3e-1afe-489e-9420-657d485690d9
sockets: 1
tags: vms
tpmstate0: vms:vm-100-disk-2,size=4M,version=v2.0
usb0: host=5-4.1
usb1: host=5-4.4
usb2: host=5-3
vmgenid: 07d41542-bd38-477f-b713-41ae69e83496

I feel like the flag isn’t used ?

root@pve:~# journalctl -f
Aug 17 23:53:11 pve pvedaemon[524866]: worker exit
Aug 17 23:53:11 pve pvedaemon[2260]: worker 524866 finished
Aug 17 23:53:11 pve pvedaemon[2260]: starting 1 worker(s)
Aug 17 23:53:11 pve pvedaemon[2260]: worker 608787 started
Aug 17 23:53:16 pve kernel: usb 5-4.1: reset full-speed USB device number 21 using xhci_hcd
Aug 17 23:53:16 pve kernel: usb 5-4.4: reset high-speed USB device number 22 using xhci_hcd
Aug 17 23:53:16 pve kernel: usb 5-3: reset high-speed USB device number 19 using xhci_hcd
Aug 17 23:53:20 pve pvedaemon[608787]: vm 100 - unable to parse value of 'cpu' - VM-specific CPU flags must be a subset of: pcid, spec-ctrl, ibpb, ssbd, virt-ssbd, amd-ssbd, amd-no-ssb, pdpe1gb, md-clear, hv-tlbflush, hv-evmcs, aes
Aug 17 23:54:28 pve pvedaemon[540793]: vm 100 - unable to parse value of 'cpu' - VM-specific CPU flags must be a subset of: pcid, spec-ctrl, ibpb, ssbd, virt-ssbd, amd-ssbd, amd-no-ssb, pdpe1gb, md-clear, hv-tlbflush, hv-evmcs, aes
Aug 17 23:55:10 pve pveproxy[604549]: vm 100 - unable to parse value of 'cpu' - VM-specific CPU flags must be a subset of: pcid, spec-ctrl, ibpb, ssbd, virt-ssbd, amd-ssbd, amd-no-ssb, pdpe1gb, md-clear, hv-tlbflush, hv-evmcs, aes

from what I’ve seen it is disabled unless added with +svm and something that could also be disabled in bios, any hints on how to use -svm ?

So I changed to host with this config

agent: 1
bios: ovmf
boot: order=scsi0;ide0;ide2;net0
cores: 24
cpu: host,flags=+virt-ssbd;+amd-ssbd;+pdpe1gb;-svm
efidisk0: vms:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hostpci0: 0000:0f:00,pcie=1,x-vga=1
ide0: local:iso/virtio-win-0.1.262.iso,media=cdrom,size=708140K
ide2: local:iso/Win11_23H2_English_x64v2.iso,media=cdrom,size=6653034K
machine: pc-q35-8.1
memory: 65536
meta: creation-qemu=8.1.5,ctime=1723891443
name: windows
net0: virtio=BC:24:11:FF:40:A4,bridge=vmbr0,firewall=1
numa: 0
ostype: win11
scsi0: vms:vm-100-disk-1,backup=0,cache=writeback,discard=on,iothread=1,size=1000G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=f717dd3e-1afe-489e-9420-657d485690d9
sockets: 1
tags: vms
tpmstate0: vms:vm-100-disk-2,size=4M,version=v2.0
usb0: host=5-4.1
usb1: host=5-4.4
usb2: host=5-3
vmgenid: 07d41542-bd38-477f-b713-41ae69e83496

same warning in journal

Aug 18 00:05:28 pve pvedaemon[619590]: vm 100 - unable to parse value of 'cpu' - VM-specific CPU flags must be a subset of: pcid, spec-ctrl, ibpb, ssbd, virt-ssbd, amd-ssbd, amd-no-ssb, pdpe1gb, md-clear, hv-tlbflush, hv-evmcs, aes

In both cases the Cyberpunk benchmark is the same, if you have any other suggestions I’d love to try

1 Like

The CPU benchmark number seems a little low to me, I expected Ryzen 5950X to have around 80~90 in single core. However, the multicore number looks alright for 5950X (32t 5950X should have around 1400, so simple calculation at 24t should be around ~1000). Lack of vCPU pinning might be the reason here.

On a dual CCD 5950X, it is a good idea to pin the vCPU equally between two CCD to improve L3 access as L3 is not shared (e.g. 6c/12t+6c/12t or pin 8c/16t to a single CCD). Without vCPU pinning, the Linux kernel may also reschedule a thread to a different CCD at any time.

On Windows, you can use Sysinternals Coreinfo to check this cache line. Ideally, you should have a uniform Level 3 Cache Line (e.g. in coreinfo, in case of 4c/8t either, as a one long ******** or a split ****---- and ----****). On the Linux side, use lscpu -e or lstopo to check. (Generally, cpu: host,topoext is a better idea because it will properly pass this cache information to the VM)

The 4090 number in Cinebench is also a little bit low, it should be around 33,000~34,000 pts. Though, I would focus on the CPU side of things first.

TimerBench looks alright to me, although switching to Invariant TSC might be a good idea (as C-state may affect how TSC tick, while ITSC doesn’t do this. In Proxmox I think it is cpu: host,invtsc)

3 Likes