High kernel process usage on CPU 0 isolcpus

Problem with this is that i cant post a link to original site. Also because most of us are using virt-manager. Our script needs to be a bit different. We can put the script in the hooks folder under /etc/libvirt/hooks

This way it is automatically run when you launch the VM.

This script resolves the issue of the missing task scheduler on the isolated cpu cores and properly pins the QEMU threads.

This is a simple shell script which uses the debug-threads QEMU argument and taskset to find vCPU threads and pin them to an affinity variable set elsewhere in the script.

There is a chance i can find a script that does what we need, but its going to take alot of Google. The issue with isolcpus is already pretty deep to begin with.

Below is code taken from null-src. Credit to them

#!/bin/bash

# clear options
OPTS=""

# set vm name
NAME="PARASITE"

# host affinity list
THREAD_LIST="8,9,10,11,12,13,14,15,24,25,26,27,28,29,30,31"

# qemu options
OPTS="$OPTS -name $NAME,debug-threads=on"
OPTS="$OPTS -enable-kvm"
OPTS="$OPTS -cpu host"
OPTS="$OPTS -smp 16,cores=8,sockets=1,threads=2"
OPTS="$OPTS -m 32G"
OPTS="$OPTS -drive if=virtio,format=raw,aio=threads,file=/vms/disk-images/windows-10.img"

function run-vm {
# specify which host threads to run QEMU parent and worker processes on
taskset -c 0-7,16-32 qemu-system-x86_64 $OPTS
}

function set-affinity {
# sleep for 20 seconds while QEMU VM boots and vCPU threads are created
sleep 20 &&
HOST_THREAD=0
# for each vCPU thread PID
for PID in $(pstree -pa $(pstree -pa $(pidof qemu-system-x86_64) | grep $NAME | awk -F',' '{print $2}' | awk '{print $1}') | grep CPU |  pstree -pa $(pstree -pa $(pidof qemu-system-x86_64) | grep $NAME | cut -d',' -f2 | cut -d' ' -f1) | grep CPU | sort | awk -F',' '{print $2}')
do
    let HOST_THREAD+=1
    # set each vCPU thread PID to next host CPU thread in THREAD_LIST
    echo "taskset -pc $(echo $THREAD_LIST | cut -d',' -f$HOST_THREAD) $PID" | bash
done
}

set-affinity &
run-vm

Yeah, that’s exactly what I meant earlier. Also I see problem why this wouldn’t work as hook for libvirt as is.

First problem to solve is to find out if you can pin cpus like taskset does here:

taskset -c 0-7,16-32 qemu-system-x86_64 

but do it AFTER vm is already running. It might be available to set up in libvirt’s xml, cant remember now.

Rest seems pretty straightforward. I will have look at this, unless someone beats me to it :slight_smile: But don’t get your hopes up, because it seems way too easy, so there may be some quirk that prevents it from working.

NVM just remembered that you posted your xlm. So i’m gonna just use that.

The taskset needs to be used outside of libvirt, vcpus are not able properly pin them selfs across guest cpus with the qemu system process. The command you posted is what you need, but i believe it needs to have all the pids of the process to work. I havnt tried this has a libvirt hook though… so i think iam going to give it a try. But you see where iam stuck at now! We need more fellow forum members on this issue. Considering most people arnt doing it properly with isolcpus.

Yeah, I know that’s the first thing I said when I saw your script.
But you posted pinning in your XML:

<cputune>
    <vcpupin vcpu="0" cpuset="0"/>
    <vcpupin vcpu="1" cpuset="1"/>
    <vcpupin vcpu="2" cpuset="2"/>
    <vcpupin vcpu="3" cpuset="3"/>
    <vcpupin vcpu="4" cpuset="4"/>
    <vcpupin vcpu="5" cpuset="5"/>
    <vcpupin vcpu="6" cpuset="6"/>
    <vcpupin vcpu="7" cpuset="7"/>

So you mean this doesn’t work?

It works half way, but when using isolcpus to pin the cpu threads, you still need taskset to correctly pin the kernel threads. The Windows threads get pinned properly, but not the kernel threads. This is my understanding from the documentation i have read on Redhat and arch wiki guides.

Most people arnt doing it correctly.

Let me see if i can give you a real world example 1 sec.

The emulator thread also needs the linux scheduler which is disabled.

So yes, most people you see here are not doing it correctly.

Ok, any example will be helpful, because now I have doubt because:

Yet cpu seems to be configured as smp=16, cores=8

And later

And later taskset is going trough that thread list assigning thread number to PID:

So I have hard time to reconcile why you would try to pin them to cores 8-15 and 24-31 since you have only 16 threads on cpu…

If THREAD_LIST was something like:
“4,5,6,7,12,13,14,15”
Then all would be clear

This was an example i posted from another website to show you what i meant by using taskset to properly pin your cpus. Its not specifically for my cpus and script can be written in many ways. Iam just not a pro at bash, but i may give it a try here in a sec and see what i can come up with for us. Are you using isolcpus too? Have you finished getting the most performance out of your KVM you can possibly get. I could help you with getting more performance if need be. If you can help me find a solution to this :slight_smile:

Lets focus on one thing :slight_smile:

I almost have working hook, but only for second part, second taskset I just posted:

I have list of PIDs too and I can assign them. But this list in example is confusing.

Without bash.
I made test VM with 4 cpus, they show as:

qemu-system-x86,450625 -name guest=usbtest,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-11-usbtest/master-key.aes -machinepc-i440fx-5.1,accel=kvm,usb=off,vmport=off
  ├─{qemu-system-x86},450639
  ├─{qemu-system-x86},450640
  ├─{qemu-system-x86},450641
  ├─{qemu-system-x86},450643
  ├─{qemu-system-x86},450644
  ├─{qemu-system-x86},450645
  ├─{qemu-system-x86},450646
  ├─{qemu-system-x86},450647
  └─{qemu-system-x86},450649

So should I assing those PIDS to cores 4-7 and 12-15? Is this correct?

Lets keep focus on one thing at a time :slight_smile:

I deleted my questions because I found your source:

Im gonna read it and get back to you later

Nice job dude. Ya you need to pin the tasks evenly across the cores.

Please run some benchmarks and give latency mon a go to.

The script neeeds to search for those pids and evenly pin them.

Can be confusing because pids change. Do you have a discord i can contact you on? It looks like we could work together on a nice guide for people.

No need to delete your thread, that was some good information. People could use that.

Yeah I undeleted that, since you already replied with quotes :slight_smile:

This is what you needed? I’m stressing 8 threads 9-16 through VM running cpu-z. And its tad faster than i7-6700K:

If yes then post your isolcpus line.

can you post the code you used for taskset?

I’m not using taskset. I need your isolcpus, to give you solution.

rcu_nocbs=0-7 nohz_full=0-7 isolcpus=0-7

Erm, if you are isolating 0-7 then everything works as it supposed to…

But If you want effect like mine (or from that guide on null-src), then you have to do at least change:
isolcpus=8-15

And change your cputune VM config to:

 <cputune>
    <vcpupin vcpu="0" cpuset="8"/>
    <vcpupin vcpu="1" cpuset="9"/>
    <vcpupin vcpu="2" cpuset="10"/>
    <vcpupin vcpu="3" cpuset="11"/>
    <vcpupin vcpu="4" cpuset="12"/>
    <vcpupin vcpu="5" cpuset="13"/>
    <vcpupin vcpu="6" cpuset="14"/>
    <vcpupin vcpu="7" cpuset="15"/>
...

And libvirt is pinning everything correctly as it supposed to. No need for tasksel.

Also, this original script from null-src is broken. Because I ran VM with it from shell without libvirt, and it just does first spawn with tasksel. Then it waits 20 seconds and exits, because it doesn’t find any processes. So its smoke and mirrors. I tested that VM with cpuz and its bit slower than through libvirt on 8 cores and bit faster on single core.

Anyway, I have more about this but its after 7am here so I need to sleep :slight_smile:

Ok, as promised below is hook for libvirt/qemu. BUT like I said before, in my case (tested on Manjaro 20.2) its pointless, because everything you need for pinning can be set from livirt’s XML VM file.
So to make it at least somewhat useful I implemented logging of CPU affinities. Someone may find it helpful for debugging settings in XML, or just as an example of hook for doing other things.

You need to put this script in /etc/libvirt/hooks/qemu and change VM1NAME to desired string. Location of log can be changed also.

#!/bin/sh
LOG="/var/log/libvirt_qemu_hook.log"
VM1NAME="w10test"

if [[ $1 == $VM1NAME ]] && [[ $2 == "started" ]]; then
  echo "VM $VM1NAME started" >> $LOG
  #Finds main PID of qemu
  for CPID in $(pidof qemu-system-x86_64);do
    MPID=$(pstree -pa $CPID | grep $VM1NAME | awk -F',' '{print $2}' | awk '{print $1}')
    if [ -n "$MPID" ];then echo "Found $VM1NAME pid $MPID" >> $LOG; break; fi
  done
  #Loops over all qemu threads
  for CPID in $(pstree -pa $MPID | cut -d',' -f2 | cut -d' ' -f1); do
    #log affinity for pids
     taskset -pc $CPID >> $LOG
  done

fi
if [[ $1 == $VM1NAME ]] && [[ $2 == "stopped" ]]; then
  echo "VM $VM1NAME stopped" >> $LOG
fi

Using vcpupin, emulatorpin and iothreadpin in XML you can change pinning as you like. For example, pinning 8 vcpus to cpu threads 8-15 like this:

<cputune>
    <vcpupin vcpu="0" cpuset="8"/>
   (... pins 1 through 6 ...)
    <vcpupin vcpu="7" cpuset="15"/>
    <emulatorpin cpuset="6-7"/>
    <iothreadpin iothread="1" cpuset="5"/>
  </cputune>

You should get following log:

VM w10test started
Found w10test pid 347930
pid 347930's current affinity list: 6,7
pid 347945's current affinity list: 6,7
pid 347946's current affinity list: 5
pid 347947's current affinity list: 6,7
pid 347949's current affinity list: 6,7
pid 347950's current affinity list: 8
pid 347951's current affinity list: 9
pid 347952's current affinity list: 10
pid 347953's current affinity list: 11
pid 347954's current affinity list: 12
pid 347955's current affinity list: 13
pid 347956's current affinity list: 14
pid 347957's current affinity list: 15
pid 347960's current affinity list: 6,7
VM w10test stopped

Of course instead logging taskset output you can pin them in other way, but like I said its pointless/redundant, and I made this script basically for practice.

Also I made another hook, that is way more useful, that replaces need for using “isolcpus”. Rebooting each time I need all cores for working on host, and rebooting again to run VM is just too much hassle. Just like FLR bug, thanks to @gnif and @belfrypossum hopefully thing of the past.

This script needs cpuset script from https://github.com/lpechacek/cpuset and allows for thread isolation for VM.
Again, you have to replace VM1NAME and VM1ISOL has to have list of threads that you want to reserve only for VM. Example is for VM above so 8-15:

#!/bin/sh
LOG="/var/log/libvirt_qemu_hook.log"
VM1NAME="w10test"
VM1ISOL="8-15"

if [[ $1 == $VM1NAME ]] && [[ $2 == "prepare" ]]; then
  echo "VM $VM1NAME preparing" >> $LOG
  cset shield -c $VM1ISOL >> $LOG
fi
if [[ $1 == $VM1NAME ]] && [[ $2 == "stopped" ]]; then
  echo "VM $VM1NAME stopped" >> $LOG
  cset shield --reset >> $LOG
fi

Like in first script there’s log that you can check if something goes wrong with cset.

1 Like

csets are very different to isolcpus, there are benefits to using it over cset such as nohz and keeping interrupts off those cores also. It is not a direct replacement and does not give reliable low latency when devices get busy. If you are setting up cores that are dedicated to a VM, isolcpus are always preferred.

Yeah, i know, i was gonna mention that, just forgot. But restart needed with isolcpus rules it out for me. Especially that games work ok on my windows console even without any isolation. So cset if anything is a step up.
Thanks for pointing that out tho.

Edit: Also forgot to add, that the same thing supposed to be achieve directly from libvirt with cgroups. But couldn’t get it working, hence the hook.