How can I increase my W10 vm's performance?

Hey all. Thanks to the many posts here, and elsewhere on the internet, I’ve successfully virtualized my Windows 10 install on a Linux host. It’s been running well for months. But, for the past couple days, I’ve been home sick, so I took the opportunity to try and tweak my VM configuration for better performance. Sadly, my tweaks ended up making things worse…

Any tips or suggestions would be appreciated.

Ok, I had better list out my setup.

Hardware:

  • PC Part Picker hardware list user/jerrac/saved/#view=MHPgJx (First post, so can’t add links. Stick the appropriate domain in front of user. :slight_smile: )

My boot drive is the 2tb 660p.

I have one GPU passed through to W10.

One 1tb sata ssd is passed through to W10 for faster data storage.

The USB 3.0 pcie card is passed through to W10.

W10’s boot disk is a qcow2 image on my host boot drive.

The 5 HDD’s are passed through to a FreeNAS vm.

Host Software:

  • OS: Pop OS 18.04 (Basically Ubuntu 18.04)
  • libvirtd (libvirt) 4.0.0
  • kvm --version
    QEMU emulator version 2.11.1(Debian 1:2.11+dfsg-1ubuntu7.19)
  • virt-manager 1.5.1

VM config files

Since my FreeNAS vm is most important, I’ll start with it:

<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
  virsh edit xunlaichest
or other application using the libvirt API.
-->

<domain type='kvm'>
  <name>xunlaichest</name>
  <uuid>f60e4e5f-6808-4250-aebf-f8238d1027f5</uuid>
  <memory unit='KiB'>33554432</memory>
  <currentMemory unit='KiB'>33554432</currentMemory>
  <vcpu placement='static'>8</vcpu>
  <iothreads>4</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='24'/>
    <vcpupin vcpu='1' cpuset='25'/>
    <vcpupin vcpu='2' cpuset='26'/>
    <vcpupin vcpu='3' cpuset='27'/>
    <vcpupin vcpu='4' cpuset='28'/>
    <vcpupin vcpu='5' cpuset='29'/>
    <vcpupin vcpu='6' cpuset='30'/>
    <vcpupin vcpu='7' cpuset='31'/>
    <emulatorpin cpuset='24-31'/>
    <iothreadpin iothread='4' cpuset='0-3'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-2.11'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/xunlaichest_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <vmport state='off'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='4' threads='2'/>
    <cache level='3' mode='emulate'/>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='sda' bus='sata'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/xunlaichest.qcow2'/>
      <target dev='sdb' bus='sata'/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' io='threads'/>
      <source dev='/dev/disk/by-id/ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K5RYSC6A'/>
      <target dev='sdc' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='2'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' io='threads'/>
      <source dev='/dev/disk/by-id/ata-WDC_WD40EZRZ-00GXCB0_WD-WCC7K0CPN273'/>
      <target dev='sdd' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' io='threads'/>
      <source dev='/dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN1W2AQ'/>
      <target dev='sde' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='4'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' io='threads'/>
      <source dev='/dev/disk/by-id/ata-ST4000DX001-1CE168_Z307QKBL'/>
      <target dev='sdf' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='5'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='threads'/>
      <source dev='/dev/disk/by-id/ata-ST4000DM004-2CV104_ZFN0EJV3'/>
      <target dev='sdg' bus='sata'/>
      <address type='drive' controller='1' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='sata' index='1'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x15'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x5'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x16'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x6'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='scsi' index='0'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1d' function='0x2'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:f7:18:64'/>
      <source network='default'/>
      <model type='e1000'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
      <image compression='off'/>
    </graphics>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='1'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
</domain>

My W10 vm looks like:

<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
  virsh edit win10-nessa
or other application using the libvirt API.
-->

<domain type='kvm'>
  <name>win10-nessa</name>
  <uuid>38e65caf-2343-4d67-bd34-302a7acc5078</uuid>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <vcpu placement='static'>8</vcpu>
  <iothreads>2</iothreads>
  <cputune>
    <vcpupin vcpu='0' cpuset='16'/>
    <vcpupin vcpu='1' cpuset='17'/>
    <vcpupin vcpu='2' cpuset='18'/>
    <vcpupin vcpu='3' cpuset='19'/>
    <vcpupin vcpu='4' cpuset='20'/>
    <vcpupin vcpu='5' cpuset='21'/>
    <vcpupin vcpu='6' cpuset='22'/>
    <vcpupin vcpu='7' cpuset='23'/>
    <emulatorpin cpuset='4-5'/>
    <iothreadpin iothread='1' cpuset='6'/>
    <iothreadpin iothread='2' cpuset='7'/>
  </cputune>
  <os>
    <type arch='x86_64' machine='pc-q35-2.11'>hvm</type>
    <loader readonly='yes' secure='yes' type='pflash'>/usr/share/OVMF/OVMF_CODE.secboot.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10-nessa_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
      <vendor_id state='on' value='abcde12345'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
    <smm state='on'/>
    <ioapic driver='kvm'/>
  </features>
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <topology sockets='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/kvm-spice</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/win10-nessa.qcow2'/>
      <target dev='vdb' bus='virtio'/>
      <boot order='2'/>
      <address type='pci' domain='0x0000' bus='0x06' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <boot order='1'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source dev='/dev/disk/by-id/ata-CT1000MX500SSD1_1824E14428DC'/>
      <target dev='sdd' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='3'/>
    </disk>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='dmi-to-pci-bridge'>
      <model name='i82801b11-bridge'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1e' function='0x0'/>
    </controller>
    <controller type='pci' index='2' model='pci-bridge'>
      <model name='pci-bridge'/>
      <target chassisNr='2'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='6' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='6' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='7' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='7' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <controller type='pci' index='8' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='8' port='0x8'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='9' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='9' port='0x9'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <controller type='pci' index='10' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='10' port='0xa'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x2'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='usb' index='0' model='nec-xhci'>
      <address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
    </controller>
    <interface type='direct'>
      <mac address='52:54:00:30:29:ca'/>
      <source dev='enp5s0' mode='bridge'/>
      <model type='e1000'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </interface>
    <interface type='network'>
      <mac address='52:54:00:ae:00:19'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <channel type='spicevmc'>
      <target type='virtio' name='com.redhat.spice.0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <sound model='ac97'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x02' function='0x0'/>
    </sound>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x05' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x09' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
    </hostdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='3'/>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </memballoon>
    <shmem name='looking-glass'>
      <model type='ivshmem-plain'/>
      <size unit='M'>256</size>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x03' function='0x0'/>
    </shmem>
  </devices>
</domain>

(Note: I’m not actually using looking-glass, that’s residual config from when I tried it.)

What I’ve been trying

So, to improve performance, I wanted to improve how I was pinning my CPU’s. I had an issue where if my W10 vm was running, my Emby server (which stores data on my FreeNAS vm) would have problems. Looking at my old config (yes, I backed it up. :slight_smile: ) I definitely had an issue there. I’m 90% sure <vcpu placement='static'>16</vcpu> needed to be <vcpu placement='static'>8</vcpu>.

In searching for a better configuration, I found an article from Mathias Hueber titled CPU-pinning and further performance tweaks for virtual machines on AMD Ryzen CPUs. (First post, so no links, please search for it.)

It showed a configuration based on keeping the pinned cpu cores in the same chiplet.

The idea makes sense to me. So I’ve tried counting backwards from 31 in the cpuset parameter.

I also made the changes that didn’t cause validation errors to the features and cpu sections.

Interestingly, when I set <cpu mode='host-passthrough' check='none'> in my W10 vm, W10 recognized my cpu as a Threadripper 1950x. Prior to that, it was shown as an Epyc processor, and the current config shows it as an Epyc processor.

To test performance, I’d shut the W10 vm down, edit the xml, then start it up again. After booting, I’d run 3DMark’s demo Time Spy test.

I experimented with different numbers of iothread pinning, the <vcpusched> settings listed in the Mathias Hueber article, and a few other things. Right now I’m still getting numbers in 3DMark that are less than what I started with.

Questions

  • Will pinning cpu cores in the same chiplet help performance?
  • Is my method of counting backwards from 31 actually going to keep all the cores in the same chiplet?
  • Is there a way to list which cpusets are in each chiplet?
  • Would the <vcpu placement='static'>16</vcpu> to <vcpu placement='static'>8</vcpu> change account for my lower cpu scores in 3DMark? Even though I only had 8 <vcpupin> items in my <cputune> settings at the time?
  • Would upgrading my virtualization software help?
    • If I upgraded to Pop OS 19.04 (or 19.10 when it is out), would that be enough?
    • Is there a reputable PPA out there for libvirtd/kvm/qemu?
    • Would upgrading add full support for my 1950x?
  • One issue I noticed is that HWINFO does not report my CPU ever boosting beyond the base 3.4ghz speed. Is there a way to enable boosting?

Thanks in advance!

1 Like

For CPU pinning:

Ryzen/Threadripper threads are numbered as ‘core 1 is threads 0 and 16, core 2 is threads 1 and 17, etc’. This differs from Intel that has ‘core 1 is threads 0 and 1, core 2 is threads 2 and 3, etc’ so keep this in mind if you want to do CPU pinning.

Additionally, make sure you have 1 core that isn’t pinned to anything to allow the host to compute without getting the scheduler too involved. This can cause performance degredation if all threads become busy and the host needs to do something.

cpuset corresponds to thread, right?

So, to pin the first chiplet, I’d want to set 0, 16, 2, 17, 3, 18, 4, and 19?

    <vcpupin vcpu='0' cpuset='0'/>
    <vcpupin vcpu='1' cpuset='16'/>
    <vcpupin vcpu='2' cpuset='2'/>
    <vcpupin vcpu='3' cpuset='17'/>
    <vcpupin vcpu='4' cpuset='3'/>
    <vcpupin vcpu='5' cpuset='18'/>
    <vcpupin vcpu='6' cpuset='4'/>
    <vcpupin vcpu='7' cpuset='19'/>

I’m intending on pinning 2 chiplets, maybe part of another, and leaving the rest free.

Thanks to Arch Linux’s wiki, I rediscovered lscpu -e. :slight_smile: That helped.

I think I’ve settled on 7 cores for windows, plus 1 iothreadpin and one emulatorpin.

FreeNAS is getting 2 cores, plus 2 iothreadpins, and 2 emulatorpins.

I’m still curious about the possibilities offered by newer versions of libvirtd and qemu. Would newer versions provide better support for my 1950x?

I’m also curious about the lack of boosting I’m seeing. AMD says my 1950x should be able to boost a core or two up to 4GHz, but lscpu -e lists the max at 3400MHz. And when testing my Windows vm, HWINFO shows all cores maxing at 3.39ish. Any thoughts on that? I’m not even sure how to really search on this issue… :.

here’s some of my enhancements that helped me.

    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='16384'/>
      <vpindex state='on'/>
      <runtime state='on'/>
      <synic state='on'/>
      <stimer state='on'/>
      <reset state='on'/>
      <vendor_id state='on' value='other'/>
      <frequencies state='on'/>
      <reenlightenment state='on'/>
      <tlbflush state='on'/>
      <ipi state='off'/>
      <evmcs state='off'/>
    </hyperv>
    <kvm>
      <hidden state='on'/>
    </kvm>
    <vmport state='off'/>
    <ioapic driver='kvm'/>
  </features>
  <cpu mode='custom' match='exact' check='full'>
    <model fallback='forbid'>EPYC-IBPB</model>
    <vendor>AMD</vendor>
    <topology sockets='1' cores='4' threads='2'/>
    <feature policy='require' name='tsc-deadline'/>
    <feature policy='require' name='hypervisor'/>
    <feature policy='require' name='tsc_adjust'/>
    <feature policy='require' name='arch-capabilities'/>
    <feature policy='require' name='cmp_legacy'/>
    <feature policy='require' name='perfctr_core'/>
    <feature policy='require' name='virt-ssbd'/>
    <feature policy='require' name='skip-l1dfl-vmentry'/>
    <feature policy='disable' name='monitor'/>
    <feature policy='disable' name='x2apic'/>
    <feature policy='require' name='topoext'/>
    <feature policy='require' name='invtsc'/>
    <feature policy='disable' name='svm'/>
  </cpu>

There are couple of things you can try:

  • disable AMD cool&quiet
  • remove nohz_full and rcu_nocbs in grub. It’ll allow you to boost if you remove it. You can keep isolcpu

Even if it say your CPU base clock in the VM, it should boost on the host.

So, it’s been a while since I tried to work on this. Since I had time today, I finally did some research. Now I have questions.

Erm, I have no idea what that is. Can you clarify? I don’t recall anything in bios like that.

I dug through /boot, /etc/grub.d, and /etc/default/grub trying to find any instances of those settings. There are none.

From the docs nohz_full is a list of cpu’s. So I don’t see anyway to set it as “off”. Unless nohz=off would do it? But I’m a bit unclear on what exactly the setting is doing. “Boottime enable/disable dynamic ticks” makes me think that it’s more likely to enable clock boosting when on, rather than disable it. And the default is on.

The docs description of rcu_nocbs leaves me no idea how that really would effect boosting.

So, can you give me more detail on why you recommend changing those settings?

Thanks!

Excuse me for not explaining each option that I have given.
I’ll explain to you, at my best of ability. I am no expert what is going on underwater, I can only give you what I see.

First off, Cool and Quiet option is removed in later version of AGESA, resulting in this suggestion becoming obsolete. As for why I decided to disable that is due to weird performance degradation I found in the diagnosis of my system. This is detailed in my thread.

Secondly, I retract the statement of removing rcu_nocbs from GRUB, because I found out that it DIDN’T affect boosting. However, nohz_full, not nohz, you’ll have to remove that from GRUB. From what I can see, it just locks the CPU in its base frequency. I have firm believe it isn’t a placebo, because running a CPU intensive benchmark like Cinebench R20 really does show lower score.

I have a separate thread on fixing CPU core boosting on isolated cpus e.g; isolcpus.
https://forum.level1techs.com/t/isolated-cpu-cores-not-boosting-fix/152806/2

Edit:
I have to say that I misread your question. It was:

One issue I noticed is that HWINFO does not report my CPU ever boosting beyond the base 3.4ghz speed. Is there a way to enable boosting?

You only see your CPU base frequency in virtualized environment. In this case Windows 10. However, your CPU is indeed boosting, only when you don’t have nohz_full in your grub. You can validate that with this command in linux:

watch -n.1 “cat /proc/cpuinfo | grep “^[c]pu Mhz””

Or a slower version of the above with CPU core count.

watch -n.1 cpupower monitor

I’ll try to answer your remaining questions:

  • Will pinning cpu cores in the same chiplet help performance?

Not sure on this one, requires someone else to chime in. I can tell you one thing, that is to dive into NUMA node tuning, because as far I can remember the memory controller is tied to each CPU die, at least in first and second generation of Zen.

  • Is my method of counting backwards from 31 actually going to keep all the cores in the same chiplet?

No, in fact you may even pinning the wrong CPU cores! If you want to see which CPU core to pin, use lstopo. The package is hwloc and the gui package is hwloc-gui. It could vary from distro’s!

  • Is there a way to list which cpusets are in each chiplet?

See the above for the lstopo package names. Use lstopo to see which CPU cores are together in one CCX. You’ll have to deduce from there if it is in one CPU die.

  • Would the <vcpu placement='static'>16</vcpu> to <vcpu placement='static'>8</vcpu> change account for my lower cpu scores in 3DMark? Even though I only had 8 <vcpupin> items in my <cputune> settings at the time?

No, it wouldn’t lower performance at all.

  • Would upgrading my virtualization software help?
    • If I upgraded to Pop OS 19.04 (or 19.10 when it is out), would that be enough?
    • Is there a reputable PPA out there for libvirtd/kvm/qemu?
    • Would upgrading add full support for my 1950x?

Would upgrading my virtualization software help? Help in what way is my question. It wouldn’t help per se in performance. Therefore upgrading Pop Os won’t matter too much in terms of performance. There is more matters than just performance.

As for full support on 1950x? It’s already supported, so no need for upgrade.

I would also like to clarify that chiplet is not equal to CCX or CPU die. Chiplet is mainly used in third generation Zen architecture and shouldn’t be confused with older architecture.
So 1950x is first generation Zen and it uses two CPU dies, each die is connected to a memory controller, hence the need for NUMA cpu pinning for improved performance. Secondly each CPU die contains two CCX of either 3 or 4 cores. You have 4 cores in each CCX.

Feel free to ask any further questions!

@FutureFade Thanks for the detailed responses!

It appears that installing cpufreqd solved my boost issue.

After digging around a bit more, I discovered that my scaling_max_freq was set to 3.4ghz.

# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq 
3400000

Along the way I installed cpufreqd and didn’t think it had done anything. The indicator applet lists 3.40 GHz as the max selectable value. And when I checked /proc/cpuinfo everything was still maxing out at 3.4ghz.

But after restarting to test another change, cpu info was showing values over 3.4ghz. So I backed off my changes until cpuinfo was showing a max of 3.4 again, and then enabled cpufreqd. After that cpuinfo showed values up to 4.0GHz.

The indicator applet still lists 3.4 as the max selectable value, and /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq is still 3.4GHz, but cpuinfo looks more like this:

# cat /proc/cpuinfo | grep MHz
cpu MHz		: 3753.503
cpu MHz		: 3686.372
cpu MHz		: 3681.312
cpu MHz		: 3376.084
cpu MHz		: 3353.151
cpu MHz		: 3797.309
cpu MHz		: 2973.637
cpu MHz		: 4001.666
cpu MHz		: 3853.652
cpu MHz		: 3766.672
cpu MHz		: 3813.336
cpu MHz		: 3880.735
cpu MHz		: 2941.177
cpu MHz		: 3653.888
cpu MHz		: 3243.804
cpu MHz		: 3352.945
cpu MHz		: 3731.801
cpu MHz		: 3632.213
cpu MHz		: 3825.221
cpu MHz		: 3694.119
cpu MHz		: 3541.035
cpu MHz		: 3694.868
cpu MHz		: 3701.389
cpu MHz		: 3844.192
cpu MHz		: 3702.765
cpu MHz		: 3699.280
cpu MHz		: 3694.352
cpu MHz		: 4061.625
cpu MHz		: 3952.496
cpu MHz		: 3653.450
cpu MHz		: 3875.335
cpu MHz		: 3648.099

My one concern right now is that the Phoronix Test Suite tests I ran before and after getting better cpuinfo results only showed a couple of points in improvement. Of course, I’ve never used that software before now, so…

I ran the git/x265, system/libreoffice, and system/blender tests and told the ones that asked to only target the cpu. If anyone knows that test suite well, would those have been effected by the increase available clock speed?

So, as for the chiplet (or, cpu die) settings I asked about, I did discover that I was counting wrong. The Arch wiki article on gpu passthrough helped there. I had actually read that before I posted this topic, so I was kicking myself for not remembering…

If I recall correctly, when I switched to keeping my cpu cores as close to each other as possible, I did see at least a little improvement in performance on my W10 vm.

That said, I will double check my settings using the tools @FutureFade suggested.

Here’s a screenshot of my results. FIRSTTRY was before making any changes. TEST2 was after getting cpufreqd enabled and cpuinfo showing better clock speeds.

So, effectively no change…

@pantato Thanks. That did help a bit. Unfortunately, libvirtd 4.0.0 didn’t support all of the options. So I ended up with:

<features>
   <acpi/>
   <apic/>
   <hyperv>
     <relaxed state='on'/>
     <vapic state='on'/>
     <spinlocks state='on' retries='16384'/>
     <vpindex state='on'/>
     <runtime state='on'/>
     <synic state='on'/>
     <reset state='on'/>
     <vendor_id state='on' value='other'/>
   </hyperv>
   <kvm>
     <hidden state='on'/>
   </kvm>
   <vmport state='off'/>
   <smm state='on'/>
   <ioapic driver='kvm'/>
 </features>
<cpu mode='host-passthrough' check='none'>
   <topology sockets='1' cores='7' threads='2'/>
   <feature policy='require' name='tsc-deadline'/>
   <feature policy='require' name='hypervisor'/>
   <feature policy='require' name='tsc_adjust'/>
   <feature policy='require' name='cmp_legacy'/>
   <feature policy='require' name='perfctr_core'/>
   <feature policy='require' name='virt-ssbd'/>
   <feature policy='disable' name='monitor'/>
   <feature policy='disable' name='x2apic'/>
   <feature policy='require' name='topoext'/>
   <feature policy='require' name='invtsc'/>
   <feature policy='disable' name='svm'/>
 </cpu>

date | test | gpu | cpu | overall
2020-01-30 | Test 1 (Before domain tweaks) | 7581 | 5834 | 7255
2020-01-30 | Test 2 (Before domain tweaks) | 7564 | 5811 | 7236
2020-01-30 | Test 3 (Before domain tweaks) | 7582 | 5722 | 7229
2020-01-30 | Test 4 (After domain tweaks) | 7592 | 6020 | 7305
2020-01-30 | Test 5 (After domain tweaks) | 7562 | 6090 | 7297
2020-01-30 | Test 6 (After domain tweaks) | 7557 | 6063 | 7287

One remaining question, hwinfo in my W10 vm still says that my cpu clock speed maxes out at 3.4GHz. Any idea why?

Mine does the same thing. Doesn’t report accurately in Windows but if I look on Linux it’s definitely boosting. Your scores look pretty good, though. You sure you’re not getting the performance you should be getting? Are you having like stutter issues in games? qcow image might be affecting things. I use a native install of Win10 on an NVME drive myself.

@jerrac

From what I can deduce, changing the scaling_max_freq doesn’t do anything. Your CPU was already boosting and changing the settings didn’t made any difference, showed in phoronx testing you did. If you reset your changes and just execute the command I give, I can assure you it’ll be boosting.

My main goal was just general improvement in performance. I wasn’t having any major issues with games or other applications. The domain tweaks seem to have helped with that.

(I forgot to mention, those numbers are from 3dmark time spy.)

My qcow image is on an nvme drive, so it has good performance.

@FutureFade Resetting my changes was exactly how I tested that cpufreqd made a difference in the output of cpuinfo (that’s the command you gave me.) The cpu was boosting before, but it would stop at 3.4GHz. After installing cpufreqd, it goes up to 4.0GHz.

If you don’t have any other qcow images on that particular drive, you might be able to dd it to make it a native install on that same drive. Like, mount it, then cp the qcow image to another folder, then dd if=/path/to/qcow/image of=/dev/nameof/nvme/drive(no partitition number) . The performance is definitely better when you load a native install as a VFIO VM rather than a qcow image.