Help with KVM VM tuning!

Hi Guys, I am suffering from some horrible CPU performance within my VM. It is a windows 10 VM with GPU pass through made primarily for gaming. After experiencing such poor performance I tried my hand at CPU pinning and setting up Hugepages but ended up with the same performance.

My specs and set up are as follows:
AMD 8350 (4 cores assigned to VM)
AMD R9 390(Guest)
Nvidia 610 (Host)
Memory 16GB (8GB assigned to guest)
Current OS: Ubuntu 16.04
Virt Manager 1.4 with latest OVMF & virtio 0.1.126 PV drivers.

*This was setup prmarily following the Guides at VFIO blogspot and Graywolf's tutorial. I initially attempted this with Debian and Fedora 24 but had the same CPU performance issue.

CPU PINNING:

I have 4 cores isolated and assigned to the VM as seen below:

<vcpu placement='static'>4</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='4'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <vcpupin vcpu='3' cpuset='7'/>
    <emulatorpin cpuset='4-7'/>
  </cputune>
…
…
 <cpu mode='host-passthrough'>
    <topology sockets='1' cores='4' threads='1'/>
  </cpu>

*Info taken from https://forum.level1techs.com/t/gpu-passthrough-with-kvm-have-your-cake-and-eat-it-too/82250/195"

When using system monitor on Ubuntu, these cores are at 0% usage until the VM is started. When the VM is started the core usage starts to go up and once I start running a game it basically maxes out from 80 -100% on all 4 cores giving horrible performance.

HUGEPAGES:

I have also tried setting up Huge pages following https://help.ubuntu.com/community/KVM%20-%20Using%20Hugepages and adding it into the VM as well:

marlon@MBLPC:~$ cat /proc/meminfo | grep Huge
AnonHugePages:    512000 kB
HugePages_Total:     946
HugePages_Free:      946
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB

and added this to the VM XML:

<memoryBacking>
    <hugepages/>
  </memoryBacking>

When the VM is running the total amount of free huge pages is still equal to the total amount so I am assuming that it is not being used?

Any ideas on what I maybe missing and why the CPU cores are maxing out?

I wish I had the solution but I do not, if you look at the post above where you linked me...

You will see I had a similar problem, but it was because I had configured my cpu wrong in QEMU/Vert-manager basically only giving the KVM 1 core and 2 threads....performance sucked, with the help of @mythicalcreature I got the configuration right and working using all the passed through cores, but I did then and still do now have those passed through cores running between 80-100% while playing games or doing really anything intense.

Game performance-wise is well within my standards, yeah the FPS could be better, and higher resolutions would be nice but I play most games on the standard settings and never try above 1080p which is the max resolution for my monitors anyway, I never had to do the CPU/core pinning, but did at one point use the hugepages never really saw much of a improvement.

One thing that might help you is more system memory, I give my KVM running Win7 16g and 6 cores of my 8370 CPU, I have a total of 32g of memory and 2 cores and 16g is more than ample to run the host Fedora system and use both concurrently.

Like I said wish I had the answer but sadly I do not.

I only have a few moments here, so I haven't had a chance to read through the thread very well. But...

Best practices:
CPU pinning is generally considered evil. CPU scheduling is hard, pinning virtual CPUs makes it harder.

For virtual CPU configuration, sockets > cores > leaving it all alone > threads. Prefer to have more sockets before adding more cores to each socket.

If you're seeing a high CPU load in virt-manager, but low to no CPU usage within the virtual machine itself, likely you're running into something that is force a lot of context switching. SQL Server in a VM is a GREAT example of this.

Well, that sucks! lol. I only tested a couple games but low resource game like strider 2 runs like it would natively but others like dragon dogma are just unplayable.

With or without cpu pinning and isolating the CPUs, the performance seems to be the same to me. Most of the success stories (near native performance) I saw and read about were with Intel hardware, which would make me think that the 8350 isn't powerful enough, but on windows bare metal it performs very well on games at 1080p.

I guess the only thing left to try would be to try and get the hugepages working. It seems to be properly setup on Ubuntu but when the VM is running, the free hugepages are equal to the total amount indicating to me that the huge pages are not being used. Huge pages are completely foreign to me so I may have messed up the config somewhere but I am currently looking into it.

So the problem your having is latency? I think that @mythicalcreature could help you with the huge pages which is why I've tagged him, I play mostly older games but played Fallout 4 and Just loaded up Just Cause 3 to play (playing JC2 right now) and haven't really had many issues that I'd say were attributed to the 8370 I'm using which isn't that much different than your 8350, but as I said I am running Win7 and twice the amount of ram along with giving it 6 cores instead of 4. I did four cores on a couple of the KVMs I built but finally settled on giving it 6....

Other than video driver problems with my R9 270's I've not had a lot of gaming issues, hopefully someone will come along with some help.

Hmmm, with the way the VM is running I am pretty sure I won't even be able to run games like Fallout 4 and Just cause even on low settings.

The cores max out at 100% on Ubuntu system monitor when a game is running, shows almost max usage on virt manager but on windows 10 task manager it fluctuates from 60% and up.

This is a bit frustrating as everything else works without issue, pci passthrough, usb passthrough even have my integrated sound passed through without issue. It's just the VM CPU performance is bad. Just moving the mouse around in the VM can cause the usage to spike from under 10% to 50%.

That's odd, JC2 has ran flawlessly in fact I don't start and stop my KVM it runs all the time the host is running (I just turn the monitors off) but JC2 I've even let run for days without shutting the game down just going out to the save screen to save my progress....super stable, I had 0% issues with Fallout 4 but again at the normal settings nothing maxed out, I normally average between 30-60 FPS which is low by a lot of peoples standards but it's playable to me....ie no latency.

So are you sharing a mouse, passing through one, or just connecting to a passed through USB controller?

I am actually using a separate mouse/keyboard combo for the VM at the moment and pass through the individual USB devices, 360 controller etc. I just had to disable apparmor in Ubuntu to be able to pass USB devices without issues. The plan is to setup synergy so I can use one shared mouse and keyboard.

30-60FPS with everything maxed out is the norm for me :-). I use a 1080p 60Hz dual input monitor anyway.

Besides the CPU performance the only other thing I had absolutely no luck with was audio. I had no luck getting an emulated 5.1 setup on the VM. To be honest I don't even know if that was even possible to begin with. I gave up on that and just passed the integrated audio through. No blacklisting was required, so once the VM is shut off the host regains audio. Would have liked to have audio from both at the same time but compromises....

Not sure what is causing this CPU performance issue as it happened on both Debian Stretch and Fedora 24 as well. Maybe hardware setting on mother board? Will double check this.

1 Like

Hi, this might not solve your problem but, have you tried using qemu/kvm without virt-manager, just from the commandline or maybe a script?

I just couldn't make sound work with virtmanager unless I passed through the onboard card, running though the command line the was no issue whatsoever. Just as a note, I'm not using pulseaudio, just alsa,

As for the CPU, the performance wasn't awful, but adding the hv flags on the script (hv_vapic,hv_time,hv_relaxed,hv_spinlocks=0x1fff,hv_vendor_id) helped a LOT. I don't know how to add them (or if you need to) to the virt-manager's xml, though.

That's interesting, I will definitely take a look into this. I have only used virt manager and virsh on the command line.

As for the audio setup currently, the pass through of the onboard sound works perfectly on the VM. I just loose audio on the host when the VM is started but once the VM is stopped I regain audio to the host without issues.

Try removing the emulatorpin tag from the XML file. That caused worse performance for me.

Unfortunately it didn't make a difference for me. Still working on it.

1 Like

I modified my CPU configuration on virt-manager to match the output of lscpu (1 socket, 4 cores, 2 threads):

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 21
Model: 2
Model name: AMD FX(tm)-8350 Eight-Core Processor
Stepping: 0
CPU MHz: 3400.000
CPU max MHz: 4000.0000
CPU min MHz: 1400.0000
BogoMIPS: 7999.30
Virtualization: AMD-V
L1d cache: 16K
L1i cache: 64K
L2 cache: 2048K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7

This increased performance ever so slightly, because of all the cores being assigned to the VM. Instead of maxing out at 100% when running a game it now dropped to the 90% range.....however...

I gained a huge performance increase by enabling the Hyper V enlightenments on the VM following this tutorial : https://scottlinux.com/2016/03/21/enable-hyper-v-enlightenments-in-kvm-for-better-windows-vm-performance/

The CPU usage has now dropped to the 40-50% range when running a game. I only tested these on two sort of light weight games, Dark Souls 2 and RE Revelations 2. I plan to test this with Witcher 3, as this is probably the most demanding game I own.

At the moment I would say the performance is about 80-85% the same as the bare metal windows performance. I encountered no performance issues having all 8 cores assigned to the VM so I figured I will leave this as is....permanently.

I removed the CPU isolation and pinning as it didn't seem to increase performance for my setup. At the moment beside the typical VM setup, I have the Hyper V enlightenments and huge pages configured. Although I still think that the huge pages are not being used because when I check the output of "cat /proc/meminfo | grep Huge" while I have a game running, the free amount of hugepages are still equal to the total available. I am not sure if this is how it should be?

I am still trying to get this up to the 95% bare metal performance that most people with this setup was able to achieve. Is there anything else I can try that maybe I missed? Also can some one provide some info on the huge pages setup?

I was very pleased with the results after testing the witcher 3 which actually resulted in performance being very close to that of the bare metal and this is with all graphical settings to the max with the nvidia stuff disabled. CPU usage also averaged around 40 -50% on system monitor.

Since then I have been testing out different games relatively pleased with the results until I finally stumbled upon a particular game which resulted in horrible performance compared to that of bare metal. This was "dragon's dogma" which ran around 12FPS while bare metal is an almost constant 60FPS. I am honestly not sure what caused this as all system resource usage seemed to be relatively low when running this game.

At this point the only other setting I think I can play around with is storage. I currently have two qcow2 containers assigned to the windows VM. These are on partitions of physical drives which are shared with my Ubuntu installation. I am not sure how much this would hamper performance. I would eventually like to test with either a LVM setup or just giving the VM its own HDD. Anyone got any thoughts on this?

Did you ever get hugepages working? I don't actually have benchmark comparisons but I've heard it can make a noticeable difference.

I'm guessing you already have something like this?

<memoryBacking>
  <hugepages/>
</memoryBacking>

Are you using the default /dev/hugepages path? Does is exist?

For storage, again I don't have benchmarks but based on what I've been reading, virtio with iothreads seem to be a good performer.

I'm using a config like this:

<iothreads>1</iothreads>
<iothreadids>
  <iothread id='1'/>
</iothreadids>

with iothread assigned to disk:

<disk type='file' device='disk'>
  <driver name='qemu' type='raw' iothread='1'/>
  <source file='/imagepath.img'/>
  <target dev='vda' bus='virtio'/>
</disk>

Other things that come to mind:

I have the hugepages setup and it is running in the default /dev/hugepages mount location. I also have the memory backing entry in the XML and KVM_HUGEPAGES=1 in /etc/default/qemu-kvm. I don't see the hugepages being used when I have the VM running so I am not too sure that is working as it should.

I actually did not have the Hyper V enlightenments in my XML. Once I added this, the performance increased exponentially. The CPU performance doesn't seem to be an issue anymore. It is usually within 40-50% on a regular game and 40-60% on a heavy game like witcher 3. I wouldn't say it is 95% the same performance as bare metal but it is pretty good.

I have not tested iothreads though, so i will look into this. I will also look into reinstalling with raw instead. Ultimate setup when I have finished testing is to setup a LVM partition.

I also tried enabling hardware assisted paging together with cpu pinning with some interesting results. Upon starting up the VM with this setup I began getting some memory errors but the CPU idle performance dropped to 0-3% almost like bare metal which was amazing. After a couple restarts the memory errors went away but the CPU performance returned to what it was before???? I have since removed this.

Excellent guide over at https://tekwiki.beylix.co.uk/index.php/VGA_Passthrough_with_UEFI%2BVirt-Manager

I have been trying to replicate some of the stuff found under the "For tips in improving performance on qemu" section. Some people noted that disabling nested page tables increased performance but I am having difficulties in doing this.

Unfortunately all this seems to be based on your specific hardware so a lot of trial and error involved. But I can say for sure out of everything I tried so far the hyper V enlightenments made a huge difference in performance. I believe I read somewhere that this is supposed to be added automatically when a VM is created on newer version of virt manager but that did not happen for me when using virt manager 1.4.

Hugepages_Free should definitely be less when the VM is running.
I just noticed from the original post that you are showing 946 hugepages total. That number should be at least 4096 for 8GB of guest memory. You can change this by setting

vm.nr_hugepages=4096

It is odd though because I'm pretty sure the guest would fail to boot if using hugepages memory backing with insufficient hugepages.

Does your /etc/libvirt/qemu.conf have something non default for this?

hugetlbfs_mount = "/dev/hugepages"

Also, one thing I did not immediately spot in the guide is to try setting

iommu=pt

in grub.

I've actually stopped doing CPU pinning altogether. It is tricky to get right and I was more than likely doing something wrong half of the time. One thing I learned since the post linked above is that the emulator and vcpu should not be pinned to the same cores.

Take a look at this post by a VFIO dev
https://www.redhat.com/archives/vfio-users/2015-September/msg00041.html

Ah yes I had initially set the huge page amount incorrectly. Strangely when checking the hugepages today I got:

marlon@MBLPC:~$ cat /proc/meminfo | grep Huge
AnonHugePages: 385024 kB
HugePages_Total: 4500
HugePages_Free: 404
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB

So it seems it was probably working all along. I honestly can't remember doing anything differently since it was first setup.

My /etc/libvirt/qemu.conf actually has the hugetlbfs_mount line commented out. Will try uncommenting to see what happens.

1 Like

I have been trying the CPU pinning on and off anytime I make new changes to the XML with no real performance difference. It does look to be a difficult to setup properly.

Also I had to set iommu=pt in grub otherwise I would get a lot of messages during boot and shutdown with respect to USB and the PCI devices I had passed through. This caused the machine to take very long when booting and powering off.

It looks like hugepages is working. hugetlbfs_mount commented out should be fine. It defaults to /dev/hugepages if not set. CPU pinning may affect latency more than anything else.

Otherwise I think I'm out of ideas.