I am having quite the issue with my virtual machines lately. After a few hours, they will completely hang, requiring a Force Off to restart them.
I will leave htop running on the vm to monitor. When it freezes, htop shows minimal cpu and memory usage, however, in virt-manager, the vm CPU usage graph shows it maxed out. Htop on my host does not reflect this either, however.
This always happens, even when the vm is idling.
My host is arch linux with i3 wm.
I am using kvm/qemu/libvert/virt-manager.
Hardware is a 5950x (not overclocked), Pro WS x570 Ace, radeon 580x gpu on the host, and 2070 super on (some) VMs. 64gb 3200 corsair"gaming" memory recently upgraded to 128gb 3200 micron ecc memory. 5x16 TB exos drives in a raidZ2 zfs array. Host OS on a samsung 970 Evo plus nvme ssd and a few 240gb crucial 2.5 ssds for (some of) the VMs. A Seasonic 1000watt power supply. All in a Meshify XL case with great airflow via 6 Noctua NF-A14 iPPC-3000 in a push pull configuration.
I have tried a variety of different OSs for the VMs: vanilla arch with i3, archo linux, manjaro, debian kde, batocera, and windows 10. On all of these I have tried with a GPU passed through and with kvm displace spice.
Some of these VMs run off a dedicated ssd, others are in a qcow “disc”.
Some use the virtual network, others have the intel network card on the mother board passed through to the vm.
I have disabled “sleep” and other power states in case that was the cause.
I have run them with and without access to a filesystem share on the zfs pool (except W10)
I have updated my motherboard bios to the latest version, and ensured the host and all vm OSs were updated.
I upgraded my ram to ecc memory this past month, but that has not changed this issue, problem has happened on both kits.
I have given the vms just a few to most of the cores, as well as the memory.
And I have tried different cpu topologies and configurations to no avail.
My trial and error with the variety of VMs and their configurations leads me to believe this is an issue with the host. From here, I do not know how to troubleshoot further and determine what is causing it to hang, rendering the VMs useless.
If I were to run dmesg, what should I be looking for?
Any help would be greatly appreciated!