Virt-manager cpu topology

CookiesAreAmazing · September 28, 2020, 6:39pm

Hello,

I have a question about cpu topology settings in the virt-manager settings - what configuration would give the best performance?

I have a CPU with 4 cores and 8 threads so I’ve been setting the topology to be 1 cpu, 2 cores 2 threads (2 threads per core -> 4 threads in the vm).

But I’ve came across this: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/virtualization_tuning_and_optimization_guide/sec-virt-manager-tuning-cpu-topology

selecting any desired number of sockets, but with only a single core and a single thread usually gives the best performance results.

If this is true then why setting the topology to something that doesn’t mimic the underlying CPU in any way give the best results?

thro · September 29, 2020, 7:45am

If KVM works like other hypervisors (and I would assume that it probably does - its not like it will be using different virtualisation instructions) you want to replicate your real world socket topology as close as possible.

VMware recommend for example that if you have one socket, you select one socket and X cores for a VM.

If you have two sockets, select 2 sockets and divide the cores you want between them.

What you DON’T want to do in the 2 socket case for example is select 3 (or any other non-even number of) cores… or any number of cores other than one in a single socket.

If you have more than 2 sockets… well, that isn’t many people these days and i’s suggest you again, match socket count (or select an easily divisible number of sockets) unless your VM has a single core.

Basically I think that a workable “rule of thumb” is that if your host has multiple sockets, and your VM has multiple cores, spread multiple cores across sockets in the VM.

There’s some sort of scheduling benefit to that, if you select multiple sockets inside the VM the guest can be “aware” its cores may be on different sockets and the VM guest’s scheduler can make latency optimisations with regards to where to schedule threads inside itself (even if it ends on on the same socket for all threads, I suspect letting it assume/be aware of worst case is better for threading purposes inside it).

source: it’s somewhere in the VMware ESXi documentation, I read up on this a year or two back.

Without reading into the context too much I can re-phrase this as “don’t add cores unless you know they are needed”.

Why is this?

Because definitely in VMware and likely other hypervisors (because they all have the same scheduling problems to deal with and there’s no “magic” algorithm for this) - a VM will not be scheduled its time-slice to actually run until ALL CORES IT HAS BEEN ALLOCATED ARE AVAILABLE AT THE SAME TIME.

i.e., if there is a time slot where one core is parked and waiting work on the host, but your VM has more than one core allocated that core on the host will be idling doing nothing and your VM will freeze until it has all of the required cores available. Even if the VM doesn’t have workload INTERNALLY scheduled on all of the cores, the hypervisor won’t schedule it until all cores it is configured for are free.

This isn’t so much a problem on end user/workstation hosts, but on servers where you have a large number of VMs all doing work, it can be quite a drag. The metric for measuring this effect in VMware is the “co stop” or “ready” metric in the VM stats.

the “co stop” time is time where the VM is essentially ready to run, and would run if it had a single core, but can’t be scheduled because it is waiting on one or more of its “other” cores to become available. You may see this even if CPU utilisation graphs look low on the host, as if a CPU is not “used” (even if it could not be scheduled) it doesn’t show as utilisation on the utilisation metrics/graphs. It’s potentially sitting there doing nothing waiting for workload to be scheduled that won’t fit into a time slot (showing as “co stop”).

Even if a guest is idle, it still needs to be scheduled however many cores it has configured by the hypervisor and given the chance to run. So idle VMs with lots of cores allocated are super bad.

if that happens a lot, your VM may run slower with more cores than if you took cores off it, or reduced the VM core counts across the cluster. Essentially the host is sitting there burning CPU time just waiting for enough cores to line up for its VMs to run. It may be worth dealing with slower performance due to reduced core count during (presumably brief) peak times in order to not uselessly burn scheduling resources on idle VMs (and killing performance for everything on the cluster) for the rest of the time.

TLDR:
Be stingy with core count in an oversubscribed environment (basically any host machine with more virtual cores running on it than real ones).

Start with one core per VM and only allocate more if the VMs actually need it - frequently. Throwing more cores at a VM just because will actually make it harder for the hypervisor to schedule and may actually reduce performance. Definitely in terms of responsiveness/latency.

If you have just as many real cores as virtual ones - go nuts, it doesn’t really matter. But the whole point of virtualisation traditionally has been oversubscription in order to make more efficient use of machines that used to typically sit 99% idle almost all day.

CookiesAreAmazing · September 30, 2020, 3:47pm

I understand that the basics of vcpu scheduling will apply to any hypervisor, however it looks like you’re basing everything mostly on vmware documentation.
I’m not really sure if all of the vCPUs have to run at the same time since KVM supports overcommiting of resources(?).

I’ve also found this: https://ieeexplore.ieee.org/document/6684443
(I don’t have the premium subscription so I don’t know any specifics from this link)

In a multi-CPU Virtual Machine(VM), virtual CPUs (VCPUs) are not guaranteed to be scheduled simultaneously.[…]

Unfortunately I can’t find any KVM-specifc information that isn’t pay walled.

elsandosgrande · October 2, 2020, 12:08pm

~~Given that I came across this when searching for overcommitment, I’m not sure that that’s what it’s called.~~

Edit

After reading thro’s reply again after reading the documentation in the link above, I think that I finally understand his reply (yes, I was confused when I read it yesterday). He basically answered your question of which topology is faster on a given processor in the first few paragraphs, but then went on a tangent about vCPU overcommitment (it’s a tangent because allocating more than half of the available logical cores was not being considered to begin with, at least from what I understand).

Also, the documentation which you found is for Red Hat Enterprise Linux 6, which has veeeeeeeeery old packages. Correct me if I’m wrong, but I’d say that documentation that old is not likely to reflect current versions of software, unless you’re on the oldest supported version of Ubuntu or something of the sorts.

Have a nice day!

WorBlux · October 2, 2020, 6:22pm

My guess is better flexibility and cache locality. If you have two physical sockets and want 6 threads if you split them into 6 sockets you can run them split 6-0, 5-1,4-2,3-3,2-4,1-5,or0-6. The downside being the guest OS is less likely to actively balance and migrates it’s program threads across sockets. Downside being uneven core utilization, with the upshot being less L1/L2 cache thrash. On Single socket hosts, core pining can increase the effect.

Of course your particular OS schedualer(s) and workload may produce different results from RedHat’s there. Only way to be sure is to benchmark it with close representation of intended workload.

thro · October 3, 2020, 2:36am

Nah, its pretty relevant because if you are not overcommitted and have idle cores the scheduler’s job is pretty fucking easy - just run the VM on one of the cores you allocated which WILL be available because you aren’t committing more virtual cores than you have physicals.

The big problems start when you have a VM with multiple cores and only one (or rather < vm core count) physical core is currently idle when it needs to be scheduled. At that time the VM will stall and WAIT sitting there for the other cores in its allocation to become available.

That’s how the VMware scheduler works, i’m not sure whether KVM is the same or not but given they’re still having to do the same sorts of thing with the same arch i wouldn’t rule it out.

With regards to sockets - there’s a memory access penalty to cross sockets normally and/or cache access differences so it makes sense to keep related threads on the same socket if the VM is able to do that.

elsandosgrande · October 3, 2020, 3:37pm

The original poster asked which topology results in the best performance. It is reasonable to assume that he does not intend to increase the amount of logical cores which he wants to allocate to his virtual machine(s), which does make it a tangent in this case, not in general, as he is not overcommitting his computer in the first place.

Post Scriptum

The overcommitment segment of your reply was really educational and interesting to read. It’s sad that I won’t have anywhere to apply the newly acquired knowledge any time soon though.

Have a nice day!