If KVM works like other hypervisors (and I would assume that it probably does - its not like it will be using different virtualisation instructions) you want to replicate your real world socket topology as close as possible.
VMware recommend for example that if you have one socket, you select one socket and X cores for a VM.
If you have two sockets, select 2 sockets and divide the cores you want between them.
What you DON’T want to do in the 2 socket case for example is select 3 (or any other non-even number of) cores… or any number of cores other than one in a single socket.
If you have more than 2 sockets… well, that isn’t many people these days and i’s suggest you again, match socket count (or select an easily divisible number of sockets) unless your VM has a single core.
Basically I think that a workable “rule of thumb” is that if your host has multiple sockets, and your VM has multiple cores, spread multiple cores across sockets in the VM.
There’s some sort of scheduling benefit to that, if you select multiple sockets inside the VM the guest can be “aware” its cores may be on different sockets and the VM guest’s scheduler can make latency optimisations with regards to where to schedule threads inside itself (even if it ends on on the same socket for all threads, I suspect letting it assume/be aware of worst case is better for threading purposes inside it).
source: it’s somewhere in the VMware ESXi documentation, I read up on this a year or two back.
Without reading into the context too much I can re-phrase this as “don’t add cores unless you know they are needed”.
Why is this?
Because definitely in VMware and likely other hypervisors (because they all have the same scheduling problems to deal with and there’s no “magic” algorithm for this) - a VM will not be scheduled its time-slice to actually run until ALL CORES IT HAS BEEN ALLOCATED ARE AVAILABLE AT THE SAME TIME.
i.e., if there is a time slot where one core is parked and waiting work on the host, but your VM has more than one core allocated that core on the host will be idling doing nothing and your VM will freeze until it has all of the required cores available. Even if the VM doesn’t have workload INTERNALLY scheduled on all of the cores, the hypervisor won’t schedule it until all cores it is configured for are free.
This isn’t so much a problem on end user/workstation hosts, but on servers where you have a large number of VMs all doing work, it can be quite a drag. The metric for measuring this effect in VMware is the “co stop” or “ready” metric in the VM stats.
the “co stop” time is time where the VM is essentially ready to run, and would run if it had a single core, but can’t be scheduled because it is waiting on one or more of its “other” cores to become available. You may see this even if CPU utilisation graphs look low on the host, as if a CPU is not “used” (even if it could not be scheduled) it doesn’t show as utilisation on the utilisation metrics/graphs. It’s potentially sitting there doing nothing waiting for workload to be scheduled that won’t fit into a time slot (showing as “co stop”).
Even if a guest is idle, it still needs to be scheduled however many cores it has configured by the hypervisor and given the chance to run. So idle VMs with lots of cores allocated are super bad.
if that happens a lot, your VM may run slower with more cores than if you took cores off it, or reduced the VM core counts across the cluster. Essentially the host is sitting there burning CPU time just waiting for enough cores to line up for its VMs to run. It may be worth dealing with slower performance due to reduced core count during (presumably brief) peak times in order to not uselessly burn scheduling resources on idle VMs (and killing performance for everything on the cluster) for the rest of the time.
TLDR:
Be stingy with core count in an oversubscribed environment (basically any host machine with more virtual cores running on it than real ones).
Start with one core per VM and only allocate more if the VMs actually need it - frequently. Throwing more cores at a VM just because will actually make it harder for the hypervisor to schedule and may actually reduce performance. Definitely in terms of responsiveness/latency.
If you have just as many real cores as virtual ones - go nuts, it doesn’t really matter. But the whole point of virtualisation traditionally has been oversubscription in order to make more efficient use of machines that used to typically sit 99% idle almost all day.