Could large ram slow Windows guest VM boot time?

I have a working windows 10 guest VM on my archlinux host, with GTX1070 Ti passthrough, and virtual USB input keyboard/mouse. Everything works fine, except guest boot time. It took about 5 mins to boot with 4 cores, 16GB ram. It’s kind of slow, but I did not use Windows that much.
However, I recently need to test some Windows specific software and decided to give my Windows guest 10 cores, 64GB ram, the boot time went from 5 to 16 mins, and the CPU usage was sky high during boot. Does large ram slow the guest boot time? is there any setting I will need to tweak?

XML from virt manager

<domain type="kvm">
  <name>win10Basic</name>
  <uuid>b6db01b3-ab59-450b-a004-2b0e1423a237</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">67108864</memory>
  <currentMemory unit="KiB">67108864</currentMemory>
  <memoryBacking>
    <hugepages/>
  </memoryBacking>
  <vcpu placement="static">20</vcpu>
  <os>
    <type arch="x86_64" machine="pc-i440fx-4.0">hvm</type>
    <loader readonly="yes" type="pflash">/usr/share/ovmf/x64/OVMF_CODE.fd</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10Basic_VARS.fd</nvram>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state="on"/>
      <vapic state="on"/>
      <spinlocks state="on" retries="8191"/>
      <vendor_id state="on" value="TestBenc123"/>
    </hyperv>
    <kvm>
      <hidden state="on"/>
    </kvm>
    <vmport state="off"/>
  </features>
  <cpu mode="host-model" check="partial">
    <topology sockets="1" dies="1" cores="10" threads="2"/>
  </cpu>
  <clock offset="localtime">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
    <timer name="hypervclock" present="yes"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="raw"/>
      <source file="/mnt/Work/vmPool/win10Base.img"/>
      <target dev="vda" bus="virtio"/>
      <boot order="1"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x07" function="0x0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/var/lib/libvirt/images/Win10_1903_V1_English_x64.iso"/>
      <target dev="hda" bus="ide"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <source file="/var/lib/libvirt/images/virtio-win-0.1.171.iso"/>
      <target dev="hdb" bus="ide"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="1"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x05" function="0x0"/>
    </controller>
    <controller type="pci" index="0" model="pci-root"/>
    <controller type="ide" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x1"/>
    </controller>
    <controller type="virtio-serial" index="0">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x06" function="0x0"/>
    </controller>
    <interface type="network">
      <mac address="52:54:00:d8:b8:d0"/>
      <source network="default"/>
      <model type="virtio"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0"/>
    </interface>
    <input type="mouse" bus="ps2"/>
    <input type="keyboard" bus="ps2"/>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x09" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="pci" managed="yes">
      <source>
        <address domain="0x0000" bus="0x03" slot="0x00" function="0x1"/>
      </source>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x0a" function="0x0"/>
    </hostdev>
    <hostdev mode="subsystem" type="usb" managed="yes">
      <source>
        <vendor id="0x046d"/>
        <product id="0xc534"/>
      </source>
      <address type="usb" bus="0" port="1"/>
    </hostdev>
    <memballoon model="virtio">
      <address type="pci" domain="0x0000" bus="0x00" slot="0x08" function="0x0"/>
    </memballoon>
  </devices>
</domain>

Run something like glances or atop on the host and see if you spot a bottleneck causing that high CPU usage.

Does your guest run highly threaded apps that it needs 20 threads? How many are left over for the host? You could try pinning specific core/thread pairs instead of letting the host decide. Also pin your IO to a specific core.

I ran atop while booting up my VM. All I saw is 2000% CPU usage by qemu-system-x86, drop to 5x% after boot. I have Xeon E5-2683v3, so 4 cores are left to host. I probably do not need all 10 cores for testing purpose, but most of 3d rendering software I used are capable to cap all 10 cores while rendering heavy scene.
I will try cpu pinging this weekend, see how it goes. Regarding to pin IO to a specific core, can you explain a bit more? or maybe a link?

Right, the cpu is pegged, but something is causing a bottle neck, it’s not just raw computations bogging down your cpu.

What’s your iowait look like? Glances is easier to read than atop and it highlights and logs issues like iowait. What are the overall specs on the host machine?

A few links to study. Most of the kvm tuning documentation out there is driven by gamers but it applies to most loads, and even non-windows virtual machines.
https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#CPU_pinning

I install glance and monitoring vm boot. It’s 0.0% iowait most of time, 4-5% at the end of boot, then drop back to 0.0%. 7x% CPU and 7x% MEM usage during boot.

Host Spec

CPU : Xeon E5 2683 v3
MB : ASRock Taichi X99
RAM : 96GB DDR4 2133 Ram 
GPU0 : GTX 970 for host
GPU1 : GTX 1070 for passthrough
HDD : 512G NVME for host sys
HDD : 60G raw image on 240G SATA SSD for guest
Host OS : Archlinux 5.7.2-arch1-1
Guest OS : Windows 10 1903 

Thanks for the link, it seems like a good read, really appreciate it. I will try to optimize my config this weekend.

1 Like

What CPU are is your system running? It may be that your hypervisor is running into an issue while waiting for resource availability from the CPU or another NUMA node.

I have Xeon E5 2683 v3
It’s set to Haswell-noTSX-IBRS in virt manager, CPU configuration,

CPU pinning seems has no effect on guest boot time. My guest VMstill took 1x mins to boot up after pinning.

lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ
  0    0      0    0 0:0:0:0          yes 3000.0000 1200.0000
  1    0      0    1 1:1:1:0          yes 3000.0000 1200.0000
  2    0      0    2 2:2:2:0          yes 3000.0000 1200.0000
  3    0      0    3 3:3:3:0          yes 3000.0000 1200.0000
  4    0      0    4 4:4:4:0          yes 3000.0000 1200.0000
  5    0      0    5 5:5:5:0          yes 3000.0000 1200.0000
  6    0      0    6 6:6:6:0          yes 3000.0000 1200.0000
  7    0      0    7 7:7:7:0          yes 3000.0000 1200.0000
  8    0      0    8 8:8:8:0          yes 3000.0000 1200.0000
  9    0      0    9 9:9:9:0          yes 3000.0000 1200.0000
 10    0      0   10 10:10:10:0       yes 3000.0000 1200.0000
 11    0      0   11 11:11:11:0       yes 3000.0000 1200.0000
 12    0      0   12 12:12:12:0       yes 3000.0000 1200.0000
 13    0      0   13 13:13:13:0       yes 3000.0000 1200.0000
 14    0      0    0 0:0:0:0          yes 3000.0000 1200.0000
 15    0      0    1 1:1:1:0          yes 3000.0000 1200.0000
 16    0      0    2 2:2:2:0          yes 3000.0000 1200.0000
 17    0      0    3 3:3:3:0          yes 3000.0000 1200.0000
 18    0      0    4 4:4:4:0          yes 3000.0000 1200.0000
 19    0      0    5 5:5:5:0          yes 3000.0000 1200.0000
 20    0      0    6 6:6:6:0          yes 3000.0000 1200.0000
 21    0      0    7 7:7:7:0          yes 3000.0000 1200.0000
 22    0      0    8 8:8:8:0          yes 3000.0000 1200.0000
 23    0      0    9 9:9:9:0          yes 3000.0000 1200.0000
 24    0      0   10 10:10:10:0       yes 3000.0000 1200.0000
 25    0      0   11 11:11:11:0       yes 3000.0000 1200.0000
 26    0      0   12 12:12:12:0       yes 3000.0000 1200.0000
 27    0      0   13 13:13:13:0       yes 3000.0000 1200.0000

cpupinning xml

<vcpu placement="static">20</vcpu>
<iothreads>2</iothreads>
<cputune>
    <vcpupin vcpu="0" cpuset="4"/>
    <vcpupin vcpu="1" cpuset="5"/>
    <vcpupin vcpu="2" cpuset="6"/>
    <vcpupin vcpu="3" cpuset="7"/>
    <vcpupin vcpu="4" cpuset="8"/>
    <vcpupin vcpu="5" cpuset="9"/>
    <vcpupin vcpu="6" cpuset="10"/>
    <vcpupin vcpu="7" cpuset="11"/>
    <vcpupin vcpu="8" cpuset="12"/>
    <vcpupin vcpu="9" cpuset="13"/>
    <vcpupin vcpu="10" cpuset="18"/>
    <vcpupin vcpu="11" cpuset="19"/>
    <vcpupin vcpu="12" cpuset="20"/>
    <vcpupin vcpu="13" cpuset="21"/>
    <vcpupin vcpu="14" cpuset="22"/>
    <vcpupin vcpu="15" cpuset="23"/>
    <vcpupin vcpu="16" cpuset="24"/>
    <vcpupin vcpu="17" cpuset="25"/>
    <vcpupin vcpu="18" cpuset="26"/>
    <vcpupin vcpu="19" cpuset="27"/>
    <emulatorpin cpuset="1-2,15-16"/>
    <iothreadpin iothread="1" cpuset="1,15"/>
    <iothreadpin iothread="2" cpuset="2,16"/>
</cputune>

There was a discussion on reddit mention archlinux and preemption causing windows 10 vm slow boot on archlinux host, I might give it a try, although this seems required compiling kernel, which I’ve never done before.

what’s the numa layout look like for your system? I forget the command, but there’s one to run that gives you a nice PNG diagram.

high cpu and a delay right at boot sounds like the uefi trying to initialize something or gather resources, and it’s running into a bottleneck.

@gordonthree This is my lstopo output
lstopo_output

I think you’ll need to hunt around your bios and see what options you have for the CPU configuration. I’m reading this CPU used something called cluster on die to segregate cores and memory controllers. Not finding a lot about it other than stuff like contact your OEM for more information :frowning:

For example
https://kb.vmware.com/s/article/2142499

Maybe hugepages can help?
https://wiki.archlinux.org/index.php/KVM#Enabling_huge_pages

Thank you @gordonthree,
I already enable Hugepage, so I guess it’s not the reason. But I will definitely look into my bios, and CoD stuff, see if I can find something related to vm boot time.

Yes, more guest RAM does increase startup time. But it shouldn’t be minutes, that’s just ridiculous.

I have a 64 GB machine where I run a 32 GB Win10 guest. It starts up in about 30 seconds (NVMe drive). I’ve adjusted the RAM settings in the guest from 8 to 32 GB at various times. The startup time is affected by how much RAM qemu has to clear out during startup. If it decides to start swapping things it slows down. I believe there’s some slowdown because of huge page compaction as well, since it starts doing page defrag during startup.

Hey, wait a minute. If I understand this correctly this requires hugepages. They’re not optional. No wonder startup takes so long! Linux has to do very time consuming memory defrag operations to get enough hugepages.

It’ll be a lot better to just let qemu do its own thing. It already marks pages for transparent huge page allocation. It will get as many as it can at start, and in the background kernel threads will acquire more huge pages as it can.

You only want to require hugepages if you’ve preallocated 64 GiB or more of them at kernel start. If you did that, then that memory isn’t available for any other uses.

1 Like

Sorry I did not mention that I follow this archwiki and this to setup hugepage. I tried removing memoryBacking tag in xml, but my host system froze after guest started.
I will check my hugepages setup again see if I set it wrong, or maybe removing my hugepage setup for now, because I saw no hugepage revd and surp usage at all while my guest vm.

#before vm start
#grep Huge /proc/meminfo 
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:   32800
HugePages_Free:    32800
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        67174400 kB

#after vm starts, during high cpu usage, before seeing Tiano logo
#grep Huge /proc/meminfo 
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:   32800
HugePages_Free:       32
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:        67174400 k

Over my head and beyond where I’ve tread in the land of KVM … Hopefully you will find a fix.

Any interesting looking CPU options in the bios? I wonder if there are side effects running a server CPU on a HEDT chipset, versus the full c602 (whatever) server chipset?

I am not familiar with BIOS set up related to VM, but I will keep it mind while searching for answer. Will try to disable hugepages setting first this weekend see how it goes.

This looks like you did have preallocated hugepages. That means they shouldn’t be the cause of the slowdown since no work has to be done to get them.

Hi!

Had the same issue last week-end, my VM was taking around 1mn to 2mn to boot to Win10.

I resolved it by setting CONFIG_PREEMPT_VOLUNTARY=y to my kernel options and recompiling.

With this set, I managed to reduce the boot time to ~10 seconds.

I was able to compile the kernel with this option pretty easily, and it was my first time too.

Simply download the PKGBUILD of your desired kernel from AUR (mine was linux-zen-vfio).

yay -G linux-zen-vfio

Then edit the config file in the downloaded folder and change those lines (#100 and #101 for me):

CONFIG_PREEMPT_VOLUNTARY=y
#CONFIG_PREEMPT=y

To avoid any checksum errors when compiling it change the checksum of this config file to SKIP on the PKGBUILD file.

Then launch the compilation with:

makepkg -si

It takes some time (around 2 to 3 hours here), but it works !

Hope this helps.

2 Likes

This looks really promising.
I did try it couple days ago, using linux-vfio from AUR, it failed to compile on my host, but works in a clean chroot. I was going to test it out yesterday, but I accidentally override my inittramfs without notice. While trying to figure out what’s going on, my Bios CMOS battery was out. I do believe Murphy’s law now.
Anyway, I got my system back, and ready to do this again, hope this will works, knock on wood.
BTW, cloning from archlinux git repo took forever. Is 150.00 KiB/s a normal speed cloning from archlinux git? Speed test shows 45Mb for my internet download speed.