I would start by diffing the original VM’s libvirt domain XML against the new one to see what has changed. You might have added some tuning options a few years ago and forgotten about it. What did the guest CPU utilization look like before the upgrades?
I don’t know how much control TrueNAS SCALE gives you, but I think it uses libvirt/QEMU on the backend and that’s what I use in plain ol’ Linux. For the best guest performance you should pin the cores and prevent the host from using them. There are several ways to accomplish the latter; the VFIO pages on the Arch wiki are helpful. You also want to make sure the topology lines up between the host and the guest. For example, if you decide to pin two cores with two threads each, make sure vcpu 0 and vcpu 1 line up with the first physical core and its logical processor and vcpu 2 and vcpu3 line up with the second physical core and its logical processor. And set the vcpu topology to sockets="1" cores="2" threads="2"
. The next step would be to consider pinning the emulator thread, and—depending on the guest’s storage configuration—possibly one or more I/O thread(s) as well.
CPU performance is high in Chrome, which makes me wonder if GPU rendering is working correctly. Since you’re passing through a GPU, make sure it’s properly excluded from the host OS. This usually means adding it to the vfio_pci.ids
list, possibly blacklisting the nouveau
module, and possibly adding vfio_pci.disable_vga=1
and/or video=efifb:off
to the kernel command line. If the system is trying to initialize the GT 1030 as the primary graphics on boot, going into the BIOS and forcing the primary graphics to the onboard graphics (assuming you have that) can sometimes fix this.
Finally, I want to mention that while the IPC, efficiency, and features of the 2620v4 are all improved over the 2667v2, it might actually be a bit of a downgrade for you. The 2620v4 is a lower TDP part and it doesn’t boost nearly as high as the 2667v2. I don’t think this is cause of your problems but it’s worth keeping in mind. Your performance expectation should be “the same or slightly less” rather than “greatly improved.”
There are lots of other things that could be contributing to the issue but this is where I’d start. If I have a VM that I want to be really fast and responsive I make sure that it has its own PCIe-passed-through GPU, NVMe drive, and NIC, and that its cores are pinned and isolcpu’d. Ideally the VM would have its own NUMA node or at least L3 cache region as well, but that’s not possible on your architecture (unless you have a dual-socket system and failed to mention it). Good luck!
ETA: Oh, there are more meltdown, etc. mitigation routines available in Broadwell than Ivy Bridge, and you might be running into slowdowns due to that as well. If you’re in a secure-enough environment where it’s safe to do so, you might consider adding mitigations=off
to the kernel command line of the host and/or the guest. Please be aware that this carries a security risk.