Hey, so this is a loooong shot, but I’m looking for any input I can get.
Llast week, VMs running SLES 15 on VMware ESXi started crashing. Like very important VMs running SAP and this is the 4 occurrence in 10 days. The VM just freezes and needs to be force reset. Last message in VMware events is:
The CPU has been disabled by the guest operating system. Power off or reset the virtual machine.
We were able to generate .vmem and .vmss files from the VM in frozen state and in theory this gives us a core dump to analyze.
Nobody has any idea so far what to do with this dump or how to interpret it, VMware support is shrugging and directing us to SUSE support and SUSE …well, it’s SUSE…
I myself have hardly any experience analyzing a dump. From what I could gather so far, you gotta run gdb with a debug kernel matching the VM’s kernel and then feed in the dump.
Getting a SEGFAULT pointing to an unknown address, but nothing more so far:
[xxx@yyyyyy zzz]$ gdb vmlinux-5.14.21-150500.55.68-default.debug vmss.core
GNU gdb (GDB) Red Hat Enterprise Linux 8.2-20.el8
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type “show copying” and “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”.
Type “show configuration” for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.For help, type “help”.
Type “apropos word” to search for commands related to “word”…
Reading symbols from vmlinux-5.14.21-150500.55.68-default.debug…done.warning: core file may not match specified executable file.
[New LWP 12345]
Core was generated by `GuestVM’.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xffffffffb604c55d in ?? ()
(gdb)
Any ideas or tips on how I could analyze further or why there is no backtrace available in this dump?