Diagnose Virtual machine freezing - kvm

Issue:
I am having quite the issue with my virtual machines lately. After a few hours, they will completely hang, requiring a Force Off to restart them.

I will leave htop running on the vm to monitor. When it freezes, htop shows minimal cpu and memory usage, however, in virt-manager, the vm CPU usage graph shows it maxed out. Htop on my host does not reflect this either, however.
This always happens, even when the vm is idling.


Setup:
My host is arch linux with i3 wm.
I am using kvm/qemu/libvert/virt-manager.
Hardware is a 5950x (not overclocked), Pro WS x570 Ace, radeon 580x gpu on the host, and 2070 super on (some) VMs. 64gb 3200 corsair"gaming" memory recently upgraded to 128gb 3200 micron ecc memory. 5x16 TB exos drives in a raidZ2 zfs array. Host OS on a samsung 970 Evo plus nvme ssd and a few 240gb crucial 2.5 ssds for (some of) the VMs. A Seasonic 1000watt power supply. All in a Meshify XL case with great airflow via 6 Noctua NF-A14 iPPC-3000 in a push pull configuration.


Troubleshooting:
I have tried a variety of different OSs for the VMs: vanilla arch with i3, archo linux, manjaro, debian kde, batocera, and windows 10. On all of these I have tried with a GPU passed through and with kvm displace spice.

Some of these VMs run off a dedicated ssd, others are in a qcow “disc”.

Some use the virtual network, others have the intel network card on the mother board passed through to the vm.

I have disabled “sleep” and other power states in case that was the cause.

I have run them with and without access to a filesystem share on the zfs pool (except W10)

I have updated my motherboard bios to the latest version, and ensured the host and all vm OSs were updated.

I upgraded my ram to ecc memory this past month, but that has not changed this issue, problem has happened on both kits.

I have given the vms just a few to most of the cores, as well as the memory.

And I have tried different cpu topologies and configurations to no avail.


My trial and error with the variety of VMs and their configurations leads me to believe this is an issue with the host. From here, I do not know how to troubleshoot further and determine what is causing it to hang, rendering the VMs useless.

If I were to run dmesg, what should I be looking for?

Any help would be greatly appreciated!

I’ve seen very similar behavior with a bad DIMM and I’ve always been able to reproduce it, make a VM and give it almost all memory, almost all CPU, and run stress-ng on it

I run this command, tweak the CPU to whatever you want

stress-ng --vm 18 --vm-bytes 90% --vm-method all --verify -t 96h -v

That will use 90% memory, 18 cores and run for 96 hours. But, usually we see results in just 10 minutes

I know you said you already swapped them, but I’d be interested to see if this test makes the issue come back. At least then you can reproduce it

thank you! I’ll give this a try soon

ps -Ao pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:32,comm

The wait channel (wchan) might give you a clue about what the qemu process is waiting on. If its not running then its stuck waiting on something.

So its been running fine for a half hour now, and hasn’t crashed or froze. I don’t know if that’s a good or bad sign

That’s interesting

Can you maybe turn off a VM at a time and let it run and see if a specific workload is causing the issue?

oh I’ve only been running one VM at a time through all this since the issue started. I haven’t tried 2+ vms at once for months now lol

EDIT: I spoke too soon, about an hour later the VM froze as usual

So your original comment about bad DIMMs got me thinking. Although I replaced the ram, both kits were 4 slot and running at 3200. At least I thought the previous was. I remember now that when I upgraded my old kit from 2x16 to 4x16, the system was no longer stable at 3200 and I had to bring it down a notch (I believe I’ve read something about the 5000 series not doing well with full speeds when using 4 dimms).

With that in mind, I brought my new micron ecc kit down to 2666. I can happily report that I’ve had a vm running under a medium load for 5 hours now with no issue! If it’s still running by morning, I think I’ll be safe in saying that I’ve found the culprit.
And if that’s the case, I’ll slowly bring my speeds back up 1 notch at a time to see how close to 3200 I can get before I run into issues again.

how many cpu’s are you dedicating to your vm’s.
more than half of your available 16? try limiting the core count per vm so your using a max of 8 from your 16. yeah i know you have 16x2threads…
but just try it.
if you find performance improves. then start playing with your thread count and see which vm’s need a little more cpu and which you can scrimp by with the minimum for, setting each up appropriately.

as for your ram, you might have run into the problem of fully populating your slots while trying to use faster than native ram speeds.
when you go over the mmu’s native memory and try to use 2 dims per channel you can run into stability problems. as motherboards will often limit you to 1 dim per channel when you use oc profile ram speeds.
if your board says manual says 3200(oc) then your likely gonna be limited to 1 dim per channel.
with the only solution for stability is to turn the ram down to native. to enable dual rank mode again.

your best option as a dual channel cpu user is 2 high density sticks rather than 4 smaller with lower density chips.

I usually use 6-8 cores on a VM. And don’t have more than one running at a time

1 Like

Could you push me in the right direction? I still only know enough linux to be a danger to myself lol. So I’m unsure of what to do with your comment

I have just tried cpu pinning, both a range and one to one, both times the vm froze as before.

I may try a different hypervisor or reinstalling this one to see if it makes a difference.

@DrLucky, if you decide to ditch your current hypervisor, I would suggest VirtualBox; if you want to spend money for a hypervisor, I would recommend VMware Workstation Pro. I have used both, and they work pretty well. The only advantage Vmware workstation Pro has over Virtualbox is the guest VMs will run faster with Vmware Workstation Pro.

Just type that in a terminal and when a VM hangs, post the relevant rows for the qemu process that hangs. When things hang like you are experiencing, more than likely they are waiting on i/o or a lock. The wchan column tells you which queue inside the kernel the process or thread is waiting on. It might help narrow down the cause.

Looking at wchan is the very first thing I do when a processing hangs. That might point to deadlock, hung filesystem, etc. From there you can look further by looking at files it has open, for example. If you run the command several times and wchan is changing then the process is not hung entirely but in some kind of wait loop. You could use strace to watch what system calls its issuing in that case. On the other hand, If the process is hung somewhere, then you might attach to the process with gdb and run “thread apply all bt” to dump out all the thread stacks.

Don’t want to take you down into the weeds if you aren’t that comfortable on the command line!

Example of not stuck:

  ps -ALo   ppid,pid,tid,wchan:40,comm | less

  1  101087  101087 -                                        qemu-system-x86
  1  101087  101093 -                                        qemu-system-x86
  1  101087  101102 -                                        IO mon_iothread
  1  101087  101103 -                                        CPU 0/KVM
  1  101087  101104 -                                        CPU 1/KVM
  1  101087  101106 -                                        SPICE Worker

A trivial stuck example:

Run in one terminal, stuck waiting on my input…

  cat >/tmp/asdf

In another term

  ps -p 97524 -Lo   ppid,pid,tid,wchan:40,comm
  PPID     PID     TID WCHAN                                    COMMAND
  97223   97524   97524 wait_woken                               cat

Which is not helpful as wait_woken is a catchall.

So, look at the user space thread stack…

  gdb -p 97524

  (gdb) thread apply all bt

  #0  0x00007fe191d14992 in __GI___libc_read (fd=0, buf=0x7fe191eee000, nbytes=131072) at ../sysdeps/unix/sysv/linux/read.c:26
  #1  0x000055576b465ba6 in ?? ()
  #2  0x000055576b4654ee in ?? ()
  #3  0x00007fe191c29d90 in __libc_start_call_main ([email protected]=0x55576b4646a0, [email protected]=1, [email protected]=0x7fffe3339ce8) at ../sysdeps/nptl/libc_start_call_main.h:58
  #4  0x00007fe191c29e40 in __libc_start_main_impl (main=0x55576b4646a0, argc=1, argv=0x7fffe3339ce8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffe3339cd8) at ../csu/libc-start.c:392

So its “stuck” in a read of fd=0. So what is fd 0? 0 is usually stdin, use lsof to find out…

 lsof -p 97524

 COMMAND   PID    USER   FD   TYPE DEVICE SIZE/OFF    NODE NAME
 cat     97524 mith  cwd    DIR  259,2     4096 7077890 /home/mith
 cat     97524 mith  rtd    DIR  259,2     4096       2 /
 cat     97524 mith  txt    REG  259,2    35280 3276949 /usr/bin/cat
 cat     97524 mith  mem    REG  259,2  5712208 3277702 /usr/lib/locale/locale-archive
 cat     97524 mith  mem    REG  259,2  2216304 3278945 /usr/lib/x86_64-linux-gnu/libc.so.6
 cat     97524 mith  mem    REG  259,2   240936 3278937 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
 cat     97524 mith    0u   CHR  136,4      0t0       7 /dev/pts/4
 cat     97524 mith    1w   REG  259,2       12 8650973 /tmp/asdf
 cat     97524 mith    2u   CHR  136,4      0t0       7 /dev/pts/4

So fd 0 is /dev/pts/4. Pts is the slave side of a pseudo terminal. Who has the master?

 lsof /dev/ptmx

 gnome-ter 10606 mith   17u   CHR    5,2      0t0   89 /dev/ptmx
 gnome-ter 10606 mith   19u   CHR    5,2      0t0   89 /dev/ptmx
 gnome-ter 10606 mith   24u   CHR    5,2      0t0   89 /dev/ptmx
 gnome-ter 10606 mith   25u   CHR    5,2      0t0   89 /dev/ptmx
 sudo      15111    root   12u   CHR    5,2      0t0   89 /dev/ptmx
 sudo      15381    root   12u   CHR    5,2      0t0   89 /dev/ptmx

So its a gnome terminal. But to verify we have to query the slave pts from the process that opened the master. gdb again…

gdb -p 10606

(gdb) print ptsname(17)
$1 = 0x7fb1ab827e30 <buffer> "/dev/pts/0"
(gdb) print ptsname(19)
$2 = 0x7fb1ab827e30 <buffer> "/dev/pts/2"
(gdb) print ptsname(24u)
$3 = 0x7fb1ab827e30 <buffer> "/dev/pts/4"

Bingo, found it. Cat is hung because it is waiting to read data from a gnome-terminal on the other side of a pseudo tty.

Anyway, thats a contrived example but you use those sorts of breadcrumbs to find out why things are stuck. Usually its a file or a socket of some sort. I didnt describe strace but that is a majorly useful tool to identify system calls a process makes. The info strace spits out includes file descriptors and filenames that are opened, read, written, etc. so you can use that info to spelunk as well.

thanks so much! Its been running fine for 12 hours now, so ya know, any minute and I’ll be able to try it out lol.

I’m certainly comfortable on the command line, I just wasn’t familiar with those, and I haven’t dug deep into system processes like that before.

oh wow! I appreciate all the detail

The vm hung, so I ran
ps -Ao pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:32,comm

the only related thing (I think) I could find in the wchan column was
285177 285177 TS - 0 19 22 0.9 Ssl do_sys_poll virt-manager

so I ran
gdb -p 285177

but I’m not sure what to do with this output:
Attaching to process 285177
[New LWP 285178]
[New LWP 285179]
[New LWP 285180]
[New LWP 285181]
[New LWP 285191]
[New LWP 285989]
[New LWP 285990]
[New LWP 287430]
[New LWP 287434]
[New LWP 287435]
[New LWP 287436]
[Thread debugging using libthread_db enabled]
Using host libthread_db library “/usr/lib/libthread_db.so.1”.
0x00007f794fd16e9f in poll () from /usr/lib/libc.so.6

What ‘category’ of linux knowledge does this fall under? This is a side to linux I need to learn I haven’t come across before

if you run “ps auxww | grep <your vm name” you’ll find the right qemu process. Its extremely long, youll know its it. What you grabbed above is virt manager and not the vm’s qemu process.

Once you’ve identified the qemu process that corresponds to your stuck vm, run that ps command like this instead to get every thread in the process. wchan is “-” in my case because nothing is stuck. Note the thread names. I have 2 virtual cpu’s assigned to this particular VM and they show up as threads.

    ps -p 26324 -Lo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:20,comm
   PID     TID CLS RTPRIO  NI PRI PSR %CPU STAT WCHAN                COMMAND
   26324   26324 TS       -   0  19   1  0.0 Sl   -                    qemu-system-x86
   26324   26331 TS       -   0  19  10  0.0 Sl   -                    qemu-system-x86
   26324   26340 TS       -   0  19  12  0.0 Sl   -                    IO mon_iothread
   26324   26341 TS       -   0  19   0  0.8 Sl   -                    CPU 0/KVM
   26324   26342 TS       -   0  19  12  0.8 Sl   -                    CPU 1/KVM
   26324   26345 TS       -   0  19  23  0.0 Sl   -                    SPICE Worker

wchan may not be helpful but the stack dump from gdb by running “thread apply all bt” within gdb is very useful. Attach that to the topic and Ill have a look. If its not obvious to me (I’m not a qemu /kvm developer) you could post it to a forum or discord wherever they hang out.

Example of runing gdb to get all the stacks:

sudo gdb -p 26324 --batch -ex "thread apply all bt"

Its wading into deeper waters. Knowing about processes, threads, thread stacks, syscalls is general knowledge that helps with any O/S. All this info is just more accessible in linux with the tools that come out of the box.

Another thing you should be doing is looking at qemu’s logs. Have you run journalctl to view them. Could be info lurking in there. Look for errors and warnings.

journalctl -b0 | egrep "libvirt|qemu"

So my VM is currently stuck. I found the correct process with ps auxww | grep debian11, and got this output from the ps command

❯ ps -p 182831 -Lo pid,tid,class,rtprio,ni,pri,psr,pcpu,stat,wchan:20,comm
    PID     TID CLS RTPRIO  NI PRI PSR %CPU STAT WCHAN                COMMAND
 182831  182831 TS       -   0  19  22  0.2 SLl  -                    qemu-system-x86
 182831  182834 TS       -   0  19   8  0.0 SLl  -                    qemu-system-x86
 182831  182842 TS       -   0  19  25  0.0 SLl  -                    IO mon_iothread
 182831  182843 TS       -   0  19  26  8.6 RLl  -                    CPU 0/KVM
 182831  182844 TS       -   0  19  27 12.0 RLl  -                    CPU 1/KVM
 182831  182845 TS       -   0  19  28  8.5 RLl  -                    CPU 2/KVM
 182831  182846 TS       -   0  19  29  8.3 RLl  -                    CPU 3/KVM
 182831  182847 TS       -   0  19  30 10.7 RLl  -                    CPU 4/KVM
 182831  182848 TS       -   0  19  31 10.0 RLl  -                    CPU 5/KVM
 182831  182971 TS       -   0  19   3  0.1 SLl  -                    SPICE Worker

wchan still seems to be coming up blank

running the gdb command, I got

❯ sudo gdb -p 182831 --batch -ex "thread apply all bt"
[New LWP 182834]
[New LWP 182842]
[New LWP 182843]
[New LWP 182844]
[New LWP 182845]
[New LWP 182846]
[New LWP 182847]
[New LWP 182848]
[New LWP 182971]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
0x00007f66ec1c1f96 in ppoll () from /usr/lib/libc.so.6

Thread 10 (Thread 0x7f5ec3fff6c0 (LWP 182971) "SPICE Worker"):
#0  0x00007f66ec1c1e9f in poll () at /usr/lib/libc.so.6
#1  0x00007f66ec5a2f68 in  () at /usr/lib/libglib-2.0.so.0
#2  0x00007f66ec54c1cf in g_main_loop_run () at /usr/lib/libglib-2.0.so.0
#3  0x00007f66ea8fbce7 in  () at /usr/lib/libspice-server.so.1
#4  0x00007f66ec14d78d in  () at /usr/lib/libc.so.6
#5  0x00007f66ec1ce8e4 in clone () at /usr/lib/libc.so.6

Thread 9 (Thread 0x7f5ed9ffa6c0 (LWP 182848) "CPU 5/KVM"):
#0  0x00007f66ec1c39ef in ioctl () at /usr/lib/libc.so.6
#1  0x00005593c04a8d96 in kvm_vcpu_ioctl () at ../qemu-7.0.0/accel/kvm/kvm-all.c:3053
#2  0x00005593c04acfc9 in kvm_cpu_exec () at ../qemu-7.0.0/accel/kvm/kvm-all.c:2879
#3  0x00005593c04ae9a6 in kvm_vcpu_thread_fn () at ../qemu-7.0.0/accel/kvm/kvm-accel-ops.c:49
#4  0x00005593c069c848 in qemu_thread_start () at ../qemu-7.0.0/util/qemu-thread-posix.c:556
#5  0x00007f66ec14d78d in  () at /usr/lib/libc.so.6
#6  0x00007f66ec1ce8e4 in clone () at /usr/lib/libc.so.6

Thread 8 (Thread 0x7f5eda7fb6c0 (LWP 182847) "CPU 4/KVM"):
#0  0x00007f66ec1c39ef in ioctl () at /usr/lib/libc.so.6
#1  0x00005593c04a8d96 in kvm_vcpu_ioctl () at ../qemu-7.0.0/accel/kvm/kvm-all.c:3053
#2  0x00005593c04acfc9 in kvm_cpu_exec () at ../qemu-7.0.0/accel/kvm/kvm-all.c:2879
#3  0x00005593c04ae9a6 in kvm_vcpu_thread_fn () at ../qemu-7.0.0/accel/kvm/kvm-accel-ops.c:49
#4  0x00005593c069c848 in qemu_thread_start () at ../qemu-7.0.0/util/qemu-thread-posix.c:556
#5  0x00007f66ec14d78d in  () at /usr/lib/libc.so.6
#6  0x00007f66ec1ce8e4 in clone () at /usr/lib/libc.so.6

Thread 7 (Thread 0x7f5edaffc6c0 (LWP 182846) "CPU 3/KVM"):
#0  0x00007f66ec1c39ef in ioctl () at /usr/lib/libc.so.6
#1  0x00005593c04a8d96 in kvm_vcpu_ioctl () at ../qemu-7.0.0/accel/kvm/kvm-all.c:3053
#2  0x00005593c04acfc9 in kvm_cpu_exec () at ../qemu-7.0.0/accel/kvm/kvm-all.c:2879
#3  0x00005593c04ae9a6 in kvm_vcpu_thread_fn () at ../qemu-7.0.0/accel/kvm/kvm-accel-ops.c:49
#4  0x00005593c069c848 in qemu_thread_start () at ../qemu-7.0.0/util/qemu-thread-posix.c:556
#5  0x00007f66ec14d78d in  () at /usr/lib/libc.so.6
#6  0x00007f66ec1ce8e4 in clone () at /usr/lib/libc.so.6

Thread 6 (Thread 0x7f5edb7fd6c0 (LWP 182845) "CPU 2/KVM"):
#0  0x00007f66ec1c39ef in ioctl () at /usr/lib/libc.so.6
#1  0x00005593c04a8d96 in kvm_vcpu_ioctl () at ../qemu-7.0.0/accel/kvm/kvm-all.c:3053
#2  0x00005593c04acfc9 in kvm_cpu_exec () at ../qemu-7.0.0/accel/kvm/kvm-all.c:2879
#3  0x00005593c04ae9a6 in kvm_vcpu_thread_fn () at ../qemu-7.0.0/accel/kvm/kvm-accel-ops.c:49
#4  0x00005593c069c848 in qemu_thread_start () at ../qemu-7.0.0/util/qemu-thread-posix.c:556
#5  0x00007f66ec14d78d in  () at /usr/lib/libc.so.6
#6  0x00007f66ec1ce8e4 in clone () at /usr/lib/libc.so.6

Thread 5 (Thread 0x7f5edbffe6c0 (LWP 182844) "CPU 1/KVM"):
#0  0x00007f66ec1c39ef in ioctl () at /usr/lib/libc.so.6
#1  0x00005593c04a8d96 in kvm_vcpu_ioctl () at ../qemu-7.0.0/accel/kvm/kvm-all.c:3053
#2  0x00005593c04acfc9 in kvm_cpu_exec () at ../qemu-7.0.0/accel/kvm/kvm-all.c:2879
#3  0x00005593c04ae9a6 in kvm_vcpu_thread_fn () at ../qemu-7.0.0/accel/kvm/kvm-accel-ops.c:49
#4  0x00005593c069c848 in qemu_thread_start () at ../qemu-7.0.0/util/qemu-thread-posix.c:556
#5  0x00007f66ec14d78d in  () at /usr/lib/libc.so.6
#6  0x00007f66ec1ce8e4 in clone () at /usr/lib/libc.so.6

Thread 4 (Thread 0x7f66e8e666c0 (LWP 182843) "CPU 0/KVM"):
#0  0x00007f66ec1c39ef in ioctl () at /usr/lib/libc.so.6
#1  0x00005593c04a8d96 in kvm_vcpu_ioctl () at ../qemu-7.0.0/accel/kvm/kvm-all.c:3053
#2  0x00005593c04acfc9 in kvm_cpu_exec () at ../qemu-7.0.0/accel/kvm/kvm-all.c:2879
#3  0x00005593c04ae9a6 in kvm_vcpu_thread_fn () at ../qemu-7.0.0/accel/kvm/kvm-accel-ops.c:49
#4  0x00005593c069c848 in qemu_thread_start () at ../qemu-7.0.0/util/qemu-thread-posix.c:556
#5  0x00007f66ec14d78d in  () at /usr/lib/libc.so.6
#6  0x00007f66ec1ce8e4 in clone () at /usr/lib/libc.so.6

Thread 3 (Thread 0x7f66e96676c0 (LWP 182842) "IO mon_iothread"):
#0  0x00007f66ec1c1e9f in poll () at /usr/lib/libc.so.6
#1  0x00007f66ec5a2f68 in  () at /usr/lib/libglib-2.0.so.0
#2  0x00007f66ec54c1cf in g_main_loop_run () at /usr/lib/libglib-2.0.so.0
#3  0x00005593c04f3cc2 in iothread_run () at ../qemu-7.0.0/iothread.c:73
#4  0x00005593c069c848 in qemu_thread_start () at ../qemu-7.0.0/util/qemu-thread-posix.c:556
#5  0x00007f66ec14d78d in  () at /usr/lib/libc.so.6
#6  0x00007f66ec1ce8e4 in clone () at /usr/lib/libc.so.6

Thread 2 (Thread 0x7f66eb3ff6c0 (LWP 182834) "qemu-system-x86"):
#0  0x00007f66ec1c756d in syscall () at /usr/lib/libc.so.6
#1  0x00005593c069c703 in qemu_futex_wait () at /usr/src/debug/qemu-7.0.0/include/qemu/futex.h:29
#2  qemu_event_wait () at ../qemu-7.0.0/util/qemu-thread-posix.c:481
#3  0x00005593c06afddd in call_rcu_thread () at ../qemu-7.0.0/util/rcu.c:261
#4  0x00005593c069c848 in qemu_thread_start () at ../qemu-7.0.0/util/qemu-thread-posix.c:556
#5  0x00007f66ec14d78d in  () at /usr/lib/libc.so.6
#6  0x00007f66ec1ce8e4 in clone () at /usr/lib/libc.so.6

Thread 1 (Thread 0x7f66eb7b1600 (LWP 182831) "qemu-system-x86"):
#0  0x00007f66ec1c1f96 in ppoll () at /usr/lib/libc.so.6
#1  0x00005593c06bbca3 in ppoll () at /usr/include/bits/poll2.h:64
#2  qemu_poll_ns () at ../qemu-7.0.0/util/qemu-timer.c:348
#3  0x00005593c06c6b07 in os_host_main_loop_wait () at ../qemu-7.0.0/util/main-loop.c:250
#4  main_loop_wait () at ../qemu-7.0.0/util/main-loop.c:531
#5  0x00005593c00f650a in qemu_main_loop () at ../qemu-7.0.0/softmmu/runstate.c:727
#6  0x00005593c00a3d72 in main () at ../qemu-7.0.0/softmmu/main.c:50
[Inferior 1 (process 182831) detached]

I checked
the logs too. There is no new entry since I started up the VM about 12 hours ago.

Good job. Nothing obvious here. The virtual CPU threads of the process are not totally stuck. Your VM is doing something, so that’s a clue at least. The problem is you cant interact with it. I dont think its really stuck, rather spice is somehow non-responsive.

On one host I have, QXL is an issue for a fedora guest. The console will freeze up after a while. The VM is still functional as I can ssh into it. I had to switch that VM’s display from QXL to virtio. You might try that.

Also, have you tried to ssh into the guests after they are stuck?

Another way to interact with the linux guest is to use the serial console. You’ll have to enable the serial console. On ubuntu/debian that looks like inside the guest:

 systemctl enable --now [email protected]

Be sure you have a serial port defined for your VM, of course. Then under View->Consoles you can switch to the serial console and you’ll be able to login via tty, old school unix style. If spice is locked up, you may still be able to switch to the serial console to verify that the VM is still running.

Finally, the logs for you vm are probably under /var/log/libvirt/qemu/. Take a look at the log for your guest and see if you see anything suspicious around the time it hangs.

I wonder if I’m not getting the printout on the correct process.
I forgot to include it on the original post, but I have tried ssh-ing into the virtual machine after it freezes and I always get the ‘not available’ message. I haven’t tried with the serial console, however, so that is next!

I can try changing the display, but I know across several different VMs I have used QXL, virtio, and a discrete GPU passed through to the host.

The log file was at /var/log/libvirt/qemu/ ! I’m combing though it now. It doesnt seem to be time stamped, so I might just have to catch it whenever it crashes again.

I’m pretty sure I’ve ran into the issue with the vm set up on a virtual network, but I’m trying it now switched to that just in case. I usually have the vm on the mother board’s second ethernet port with vfio passthrough.

I have been trying to determine when all of this started, only recently have I had vm’s running constantly, so it if was an issue before, I wouldn’t have noticed. Last OS install, I purposefully didn’t include a DE and just went with a window manager. So I am wondering if I missed something essential that could be causing this.

Again, thank you for your help! I not only appreciate the time you’ve put into your replies, but the learning I am getting from it :slight_smile: