Very slow Windows Performance in VM

Hello!

First my Specs:

System:    Host: archlinux Kernel: 5.1.2-arch1-1-ARCH x86_64 bits: 64 compiler: gcc v: 8.3.0 Desktop: Gnome 3.32.2 
           Distro: Arch Linux 
Machine:   Type: Desktop Mobo: ASUSTeK model: ROG ZENITH EXTREME v: Rev 1.xx serial: <filter> UEFI: American Megatrends 
           v: 1701 date: 01/09/2019 
CPU:       Topology: 16-Core (2-Die) model: AMD Ryzen Threadripper 1950X bits: 64 type: MT MCP MCM arch: Zen rev: 1 
           L2 cache: 8192 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 256077 
           Speed: 3999 MHz min/max: 2200/4000 MHz Core speeds (MHz): 1: 4000 2: 2000 3: 2032 4: 2079 5: 2199 6: 2127 7: 2853 
           8: 2125 9: 4000 10: 2000 11: 2000 12: 2000 13: 2000 14: 4000 15: 2163 16: 2000 17: 2000 18: 2075 19: 2664 20: 2097 
           21: 2064 22: 2117 23: 2160 24: 2000 25: 2360 26: 2062 27: 4000 28: 2378 29: 2000 30: 2000 31: 2090 32: 3728 
Graphics:  Device-1: NVIDIA GP102 [GeForce GTX 1080 Ti] vendor: ASUSTeK driver: N/A bus ID: 08:00.0 
           Device-2: Advanced Micro Devices [AMD/ATI] Lexa XT [Radeon PRO WX 3100] driver: amdgpu v: kernel bus ID: 42:00.0 
           Device-3: NVIDIA GP102 [GeForce GTX 1080 Ti] vendor: ASUSTeK driver: N/A bus ID: 43:00.0 
           Display: wayland server: X.Org 1.20.4 driver: N/A tty: N/A 
           Message: Unable to show advanced data. Required tool glxinfo missing. 
Audio:     Device-1: NVIDIA GP102 HDMI Audio vendor: ASUSTeK driver: snd_hda_intel v: kernel bus ID: 08:00.1 
           Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: ASUSTeK driver: snd_hda_intel v: kernel 
           bus ID: 0a:00.3 
           Device-3: AMD Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] driver: snd_hda_intel v: kernel 
           bus ID: 42:00.1 
           Device-4: NVIDIA GP102 HDMI Audio vendor: ASUSTeK driver: snd_hda_intel v: kernel bus ID: 43:00.1 
           Sound Server: ALSA v: k5.1.2-arch1-1-ARCH 
Network:   Device-1: Intel I211 Gigabit Network vendor: ASUSTeK driver: igb v: 5.6.0-k port: 2000 bus ID: 03:00.0 
           IF: enp3s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           Device-2: Aquantia AQC107 NBase-T/IEEE 802.3bz Ethernet [AQtion] vendor: ASUSTeK driver: atlantic v: 2.0.4.0-kern 
           port: 2000 bus ID: 05:00.0 
           IF: enp5s0 state: up speed: 10000 Mbps duplex: full mac: <filter> 
           IF-ID-1: macvtap0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
Drives:    Local Storage: total: 2.73 TiB used: 180.25 GiB (6.5%) 
           ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 EVO 1TB size: 931.51 GiB 
           ID-2: /dev/nvme1n1 vendor: Samsung model: SSD 970 EVO 1TB size: 931.51 GiB 
           ID-3: /dev/sda vendor: Western Digital model: WD10EADS-42P6B0 size: 931.51 GiB 
           ID-4: /dev/sdb vendor: Apple model: HDD WD10EZES-40UFAA0 size: 931.51 GiB 
Partition: ID-1: / size: 915.60 GiB used: 180.19 GiB (19.7%) fs: ext4 dev: /dev/nvme1n1p2 
           ID-2: /boot size: 299.4 MiB used: 57.1 MiB (19.1%) fs: vfat dev: /dev/nvme1n1p1 
Sensors:   System Temperatures: cpu: 52.5 C mobo: N/A gpu: amdgpu temp: 63 C 
           Fan Speeds (RPM): cpu: 0 gpu: amdgpu fan: 1967 
Info:      Processes: 493 Uptime: 15m Memory: 62.83 GiB used: 22.85 GiB (36.4%) Init: systemd Compilers: gcc: 8.3.0 
           clang: 8.0.0 Shell: bash v: 5.0.7 inxi: 3.0.34 

Iommu, SVM and everything else is active.

And the Performance of Windows is very laggy and often slow in the VM. It tooks over 70 Minutes to install Windows 10.

It tooks over 5 Minutes to start Windows 10.

I use Network and Harddisk as virtio devices. Drivers installed (otherwise it wouldnt boot).

Caching from the virtual Disk is disabled and I/O Mode set to native.

I use the lastest ovmf release. And virt-manager (qemu/kvm/libvirtd)

Can somebody help me?

For testing i installed yet virtualbox 6.

With Virtualbox it is super fast.

post your xml

Curious why the 1080 Ti shows up with two different bus ID’s. Have you tried running the Windows 10 VM without the 1080 Ti and see if the performance turns out alright

Im far from an expert. Virtualbox VM and to an extent QEMU vm can be super slow because of the drive emulation.

I focused in QEMU because it was for me more elegant. Passing through a storage device to the VM natively is like night and day performance.

I guess you need to do hd performance tests on the various VM’s you want and experiment for the best performance.

if i am home, i post the xml. Its standard xml generated from virt-manager. Dont change anything besides the devices like Harddisk and network (virtio).

Because i have 2x 1080ti’s build in and a WX3100 from AMD. On Linux i use only the WX3100.

The Win10 VM runs without PCI attached Devices.

This was my first assumption. But the hd performance is fine (nvme).

post your xml.

yeah, you said that already :slight_smile:

Im Home now:

<!--
WARNING: THIS IS AN AUTO-GENERATED FILE. CHANGES TO IT ARE LIKELY TO BE
OVERWRITTEN AND LOST. Changes to this xml configuration should be made using:
  virsh edit win10
or other application using the libvirt API.
-->

<domain type='kvm'>
  <name>win10</name>
  <uuid>929dd0ad-5ac9-42c7-a54e-54ca538fce87</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>16777216</memory>
  <currentMemory unit='KiB'>16777216</currentMemory>
  <vcpu placement='static'>8</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-4.0'>hvm</type>
    <loader readonly='yes' type='pflash'>/usr/share/ovmf/ovmf_code_x64.bin</loader>
    <nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
    <bootmenu enable='yes'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <vmport state='off'/>
  </features>
  <cpu mode='host-model' check='partial'>
    <model fallback='allow'/>
    <topology sockets='1' cores='4' threads='2'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/var/lib/libvirt/images/win10.img'/>
      <target dev='vda' bus='virtio'/>
      <boot order='1'/>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/home/fsddfsd/Downloads/Win10_1809Oct_v2_German_x64.iso'/>
      <target dev='sdb' bus='sata'/>
      <readonly/>
      <boot order='2'/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='2' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x14'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x4'/>
    </controller>
    <interface type='direct'>
      <mac address='52:54:00:e6:09:04'/>
      <source dev='enp3s0' mode='bridge'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
      <image compression='off'/>
    </graphics>
    <sound model='ich9'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1b' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </memballoon>
  </devices>
</domain>

If you do RAW filesystem + red hat virtIO controller + virtio SCSI device during windows install you can increase performance considerably in my experience. On my NVMe SSD that gives me upto 100mb rw on qdepth1. The problem is when you have lots of small files in queue and then it shits itself. My plan is forwarding SATA for VMs and NVMe controller for host. Though that raw+virtio boost is usually enough to make Windows equivalent to a higher end laptop.

You also have to keep in mind that merely “using” the virtio AFTER windows is installed doesn’t work. You have to install the drivers DURING the windows install by selecting the proper .inf files in the proper order. Real pain in the ass.

But yeah VMware and VirtualBox do have virsh beat on virtual drive performance I’m not sure they have the correct fixes to enable AMD SVM if you’re using AMD which means, coupled with CPU Pinning, static huge pages, and CPU Isolation you could potentially get more performance than those emulators with the right tweaks.

i exactly did that. See the XML above.

I even tried cpu pinning. and huge pages, and lot more what the arch wiki and other sites say are the best settings for high performance.

Still: In VMware Workstation Windows runs like native. In Virtualbox Windows runs like native. Only in virt-manager/kvm/qemu its a pain in the ass.

(no worries, after i try a another solution, and it didnt work, i DD a image back to the startpoint. I taked this image directly after i installed archlinux)

to passthrough a hard disk (SSD 850pro) dont make a different. VM still so slow, that installation of Windows takes a long time.

That’s super weird. Gimme a few minutes to do some testing of my own I’ll post back. I’ll record a video of my performance if you’d like you can tell me how slow it is in comparison to your VM. I mean I think the speed is acceptably slow in my case but perhaps its too slow for you?

No. i know what you mean. I know how it feels.

I dont know how to make a Video under Linux. But this is maybe the best to show you.

I mean: The VM is actually very slow! The Cursor runs with 2-3 FPS, and if i click a button it takes a minute to go further. In the Overview Window from virt-manager the CPU Usage is only around 10%. But, after i get the installation ready (over a hour later), the Windows Desktop appears, and everything i start (like a Folder on Desktop), it tooks very long, first appears the blue circles wich circles a minute long and than the Folders opens. But sometimes the vm “wakes up” and for 30 secounds to 1 minute its run like native. But then again super slow.

Its runs so slow, that Windows is totaly unusable. Windows 7, Windows 8.1 and even XP. All super slow.

I try now Linux OS like Ubuntu, Manjaro & co as guest. If there is the performance so bad too.

Yeah mine is definitely far more responsive than that. Let me post my .xml

<domain type='kvm'>
  <name>win10</name>
  <uuid>942b9133-7b3a-4bfc-8dbd-d556acdf8410</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <vcpu placement='static'>6</vcpu>
  <os>
    <type arch='x86_64' machine='pc-i440fx-3.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <vmport state='off'/>
  </features>
  <cpu mode='host-passthrough' check='none'>
    <topology sockets='1' cores='3' threads='2'/>
    <feature policy='require' name='topoext'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none' io='native'/>
      <source file='/home/maxr/KVM/win10.img'/>
      <target dev='hda' bus='ide'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='lolololWINDOZE'/>
      <target dev='hdb' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='0' target='0' unit='1'/>
    </disk>
    <disk type='file' device='cdrom'>
      <driver name='qemu' type='raw'/>
      <source file='/usr/share/virtio/virtio-win.iso'/>
      <target dev='hdc' bus='ide'/>
      <readonly/>
      <address type='drive' controller='0' bus='1' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'/>
    <controller type='ide' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:2c:d5:ae'/>
      <source network='default'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='spice' autoport='yes'>
      <listen type='address'/>
      <image compression='off'/>
    </graphics>
    <sound model='ich6'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </sound>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='2'/>
    </redirdev>
    <redirdev bus='usb' type='spicevmc'>
      <address type='usb' bus='0' port='3'/>
    </redirdev>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </memballoon>
  </devices>
</domain>


im not a expert. But for me it looks the same.

Come to think of it 10MB/ps is maxing out my RAW virtio throughput as well. This is pretty shit. There has to be a better way than passing through the whole SATA controller. I think the only difference is that I’m running a fixed 4.2GHz overclock on all cores on water. The only other suggestion is maybe just let it sit and run in the background for an hour or two? Let it cache shit and update and stuff?

that sounds like a problem with the virtual display rather than the drive.

Note: you can also pass through a partition or drive with VirtIO instead of an image file to increase performance, and have the advantage of making it directly readable by the host if you need to do offline file management.

You could try increasing the qxl vmem settings, if you’re trying to run it at high resolutions.
currently you only have the framebuffer set to 16mb.

Another tweak you can try is doing evdev passthrough on your keyboard and mouse, which should make them about as responsive as they’d be on baremetal.

1 Like

I tried already to passthrough a whole drive: Same performance.
I also tried without virtual display or gpu. only passthrough gpu: Same performance.
And in the same turn i try external gpu, i tried extra mouse and keyboard too (passthrough pci usb chipset).

Nothing helped.

Im setting pfSense up for my homelab right now. when im done with it, i try different linux distribrutions in virt-manager to see if is there a difference.

you shouldn’t run the VM with spice/qxl if you have a gpu in the guest in general.

Did you install a clean windows system or pass through one that was already used on baremetal? that can cause transient issues.

So can scheduler conflicts between the host and guest.

https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF#Performance_tuning

Have you tried q35?

I dont run the vm with spice/qxl when i tested it with gpu.

Yes, i reinstalled windows everytime with the official iso.

I tried this already. No change. (like i said, i tried already everything from the wiki)

q35 is the standard setting, so yes. and i tried i440fx(? forgot the name).

no change too.

sorry yeah, the other guy’s xml was 440, not yours.

You tried using cset to bind cpus?

what about disabling msi interrupts in the guest?