VM Config Sanity-Check & Performance improvement recommendations

Hi Guys,

Quick intro seeing as this my first post here…,

Long time user of all things virtual… worked with everything from VMWare, Docker, App-V etc etc in my working career.
Started to use QEMU\Libvirt when it was introduces in UnRaid a few years ago.
Stepped up from my old N54L HP box to a dual E5-2670 xeon setup on an ASRock EP2C602-4L/D16 a few years ago and gradually got to the point where i’ve two main VMs I use day to day.

Hypervisor Specs:
Unraid 6.6.6
QEMU 3.0.0
Libvirt: 4.7.0
Linux Kernel: 4.18.20
Boot Args: isolcpus=1-5,17-22 intel_iommu=on iommu=pt pci-stub.ids=10de:1288,10de:0e0f,13f6:8788,1002:68e0,1002:aa68,1106:3483,10b5:8112,06:00.0,07:00.0,08:00.0,111d:8018,8086:10bc,144d:a808,10de:1b80,10de:10f0
IOMMU: https://pastebin.com/e4gAwXc8

VM1 - PfSense
https://pastebin.com/LKCt6djK
Passed through 4 port NIC

VM2 - Windows10
https://pastebin.com/VtAY4Qwk
Passed through NVMe
Passed through GTX1080

While both VMs are working great, I am aware that im not anywhere near bare metal performance on my W10 VM. VM is noticably laggy for a short while after logging in, and fps isnt what id expect while gaming, despite the GTX1080 and NVMe passed through.

Ive experimented with a q35 version of my W10 VM: https://pastebin.com/WEFVjhu4
testing emulator pinning, L3 cache emulation, extra PCIe root ports and assigning the 1080 and NVMe device to them… none of which make much of a difference. In fact, emulator pinning cripples performance (no idea why…).
I have a feeling my main issue is down to NUMA configuration…
my hands are tied in a sense as im using every PCIe port available on my motherboard, and the 1080 covers the next port down because of its size, so has to go in the bottom port.
Also, I use PCIe bifurcation to split an x16 slot into a 4x4x4x4 slot to house 4 NVMe devices, and that feature is only available on a couple of my PCIe slots.
So as a result, im pretty sure my windows VM has an NVMe on PCIe lanes on one numa node, and the 1080 on the other.

So…
Any advice on what I can do to improve performance would be much appreciated. Im not really a linux guy, but im not afraid to get my hands dirty and learn more as im going.
If anything is glaringly wrong with my W10 Q35 XML, id appreciate the criticism\shaming!

lstopo attached:

After putting all this down in a post, its given me some clarity on whats possibly hampering performance.

I’m going to attempt the following:
W10 VM
update cpu sets to: 4,20,5,21,6,22,7,23
add emulatorpin to: 3,19

PfSense VM:
Update cpu sets to: 14,30,15,31
add emulatorpin to: 13,29

Im also going to look into:
Swapping the physical location of the NVMe riser card, and the PCIe NIC passed through to PfSense. then hardware passed to each VM is on the same NumaNode as the CPUs passed through
Ive purchased an additional 32GB DDR3 to add to the system. With my total being 32GB (16GB on each NumaNode), im concerned there’s some overlap with my windows VM being assigned 12GB. That only leaving 4GB on the first NUMaNode.

Couple of questions…
Whats the best way to work out if my VM is using resources from more than one NumaNode?
Does Qemu\libvirt by default only assign memory to a VM which is part of the same NumaNode the CPUs assigned are part of? Or do i need to specify that in my xml inside some tags?

Further to my previous posts ive changed the following…

  • CPU pinning updated to make things a little tidier.
  • Dropped Memory down to 8GB, which eliminated NUMA misses completely
  • Added emulator pinning to take the load off of CPU0
  • Added more hyper-v enlightenments, which seemed to have helped with latency.
  • Updated CPU stub list to reflect new CPU core assignments in VMs
  • XML: https://pastebin.com/xn1w0n4q

ToDo:

  • NVMe PCIe slot swap, so im using PCIe lanes on the same NUMA node as my CPU cores.

I’ve not used any IO threads as my disk is a passed through PCIe NVME, and correct me if im wrong, but IOThreads are only really used for vfio based controllers with disks as img files?

I have the ‘issue’ with the x1 PCIe ports being presented to passed through devices, as detailed here: Increasing VFIO VGA Performance
So im patiently waiting for fixes to hit an official release. Am I correct in assuming that because my hypervisor is UnRaid, im not able to apply any patches which are being discussed in that thread?