How to debug kernel panics?

My new mini PC (proxmox) server is crashing and going offline randomly. I am not sure why this is happening. Network goes offline (remote access) and the system requires a hard reboot to get back onto its feet (keyboard/mouse don’t work after crash).

Here’s the error shown on the screen:

Does anyone have some tips or helpful resources how to debug this?

I logged back in and issued journalctl -p err..alert to get list of errors below

Sep 10 18:48:29 m6 kernel: usbhid 1-1:1.3: couldn't find an input interrupt endpoint
Sep 10 18:48:30 m6 kernel: Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-19-0-4.sfi (-2)
Sep 10 18:48:30 m6 kernel: Bluetooth: hci0: Failed to read MSFT supported features (-56)
Sep 10 18:48:37 m6 pvecm[1320]: got inotify poll request in wrong process - disabling inotify
Sep 10 18:50:40 m6 kernel: watchdog: watchdog0: watchdog did not stop!
-- Boot 7afb7e39ce044b4eb0348bba5d340e9a --
Sep 10 18:51:14 m6 kernel: usbhid 1-1:1.3: couldn't find an input interrupt endpoint
Sep 10 18:51:15 m6 kernel: Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-19-0-4.sfi (-2)
Sep 10 18:51:15 m6 kernel: Bluetooth: hci0: Failed to read MSFT supported features (-56)
Sep 10 18:51:15 m6 smartd[895]: Device: /dev/nvme0, number of Error Log entries increased from 432 to 434
Sep 10 18:52:28 m6 kernel: watchdog: watchdog0: watchdog did not stop!
-- Boot 88ba8251bd8f4c79a49aecec65a0c1b4 --
Sep 10 18:53:02 m6 kernel: usbhid 1-1:1.3: couldn't find an input interrupt endpoint
Sep 10 18:53:03 m6 kernel: Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-19-0-4.sfi (-2)
Sep 10 18:53:03 m6 kernel: Bluetooth: hci0: Failed to read MSFT supported features (-56)
Sep 10 18:53:03 m6 smartd[921]: Device: /dev/nvme0, number of Error Log entries increased from 434 to 436
Sep 10 18:57:23 m6 pveproxy[22413]: got inotify poll request in wrong process - disabling inotify
Sep 10 18:58:03 m6 kernel: watchdog: watchdog0: watchdog did not stop!
-- Boot 61ed2537ca1240ba8500a4d441e24856 --
Sep 10 18:58:38 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 10 18:58:38 m6 kernel: ACPI Error: Aborting method \_SB.PR01._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 10 18:58:38 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 10 18:58:38 m6 kernel: ACPI Error: Aborting method \_SB.PR02._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 10 18:58:38 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 10 18:58:38 m6 kernel: ACPI Error: Aborting method \_SB.PR03._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 10 18:58:38 m6 kernel: usbhid 1-1:1.3: couldn't find an input interrupt endpoint
Sep 10 18:58:39 m6 kernel: Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-19-0-4.sfi (-2)
Sep 10 18:58:39 m6 kernel: Bluetooth: hci0: Failed to read MSFT supported features (-56)
Sep 10 18:58:39 m6 smartd[915]: Device: /dev/nvme0, number of Error Log entries increased from 436 to 438
Sep 10 20:10:20 m6 pmxcfs[23896]: [quorum] crit: quorum_initialize failed: 2
Sep 10 20:10:20 m6 pmxcfs[23896]: [quorum] crit: can't initialize service
Sep 10 20:10:20 m6 pmxcfs[23896]: [confdb] crit: cmap_initialize failed: 2
Sep 10 20:10:20 m6 pmxcfs[23896]: [confdb] crit: can't initialize service
Sep 10 20:10:20 m6 pmxcfs[23896]: [dcdb] crit: cpg_initialize failed: 2
Sep 10 20:10:20 m6 pmxcfs[23896]: [dcdb] crit: can't initialize service
Sep 10 20:10:20 m6 pmxcfs[23896]: [status] crit: cpg_initialize failed: 2
Sep 10 20:10:20 m6 pmxcfs[23896]: [status] crit: can't initialize service
Sep 10 22:12:05 m6 pveproxy[76889]: got inotify poll request in wrong process - disabling inotify
Sep 11 03:47:16 m6 pmxcfs[23896]: [confdb] crit: cmap_dispatch failed: 2
Sep 10 18:58:38 m6 kernel: ACPI Error: Aborting method \_SB.PR02._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 10 18:58:38 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 10 18:58:38 m6 kernel: ACPI Error: Aborting method \_SB.PR03._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 10 18:58:38 m6 kernel: usbhid 1-1:1.3: couldn't find an input interrupt endpoint
Sep 10 18:58:39 m6 kernel: Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-19-0-4.sfi (-2)
Sep 10 18:58:39 m6 kernel: Bluetooth: hci0: Failed to read MSFT supported features (-56)
Sep 10 18:58:39 m6 smartd[915]: Device: /dev/nvme0, number of Error Log entries increased from 436 to 438
Sep 10 20:10:20 m6 pmxcfs[23896]: [quorum] crit: quorum_initialize failed: 2
Sep 10 20:10:20 m6 pmxcfs[23896]: [quorum] crit: can't initialize service
Sep 10 20:10:20 m6 pmxcfs[23896]: [confdb] crit: cmap_initialize failed: 2
Sep 10 20:10:20 m6 pmxcfs[23896]: [confdb] crit: can't initialize service
Sep 10 20:10:20 m6 pmxcfs[23896]: [dcdb] crit: cpg_initialize failed: 2
Sep 10 20:10:20 m6 pmxcfs[23896]: [dcdb] crit: can't initialize service
Sep 10 20:10:20 m6 pmxcfs[23896]: [status] crit: cpg_initialize failed: 2
Sep 10 20:10:20 m6 pmxcfs[23896]: [status] crit: can't initialize service
Sep 10 22:12:05 m6 pveproxy[76889]: got inotify poll request in wrong process - disabling inotify
Sep 11 03:47:16 m6 pmxcfs[23896]: [confdb] crit: cmap_dispatch failed: 2
Sep 11 03:47:16 m6 pmxcfs[23896]: [quorum] crit: quorum_dispatch failed: 2
Sep 11 03:47:16 m6 pmxcfs[23896]: [dcdb] crit: cpg_dispatch failed: 2
Sep 11 03:47:16 m6 pmxcfs[23896]: [dcdb] crit: cpg_leave failed: 2
Sep 11 03:47:16 m6 pmxcfs[23896]: [status] crit: cpg_dispatch failed: 2
Sep 11 03:47:16 m6 pmxcfs[23896]: [status] crit: cpg_leave failed: 2
Sep 11 03:47:16 m6 pmxcfs[23896]: [quorum] crit: quorum_initialize failed: 2
Sep 11 03:47:16 m6 pmxcfs[23896]: [quorum] crit: can't initialize service
Sep 11 03:47:16 m6 pmxcfs[23896]: [confdb] crit: cmap_initialize failed: 2
Sep 11 03:47:16 m6 pmxcfs[23896]: [confdb] crit: can't initialize service
Sep 11 03:47:16 m6 pmxcfs[23896]: [dcdb] crit: cpg_initialize failed: 2
Sep 11 03:47:16 m6 pmxcfs[23896]: [dcdb] crit: can't initialize service
Sep 11 03:47:16 m6 pmxcfs[23896]: [status] crit: cpg_initialize failed: 2
Sep 11 03:47:16 m6 pmxcfs[23896]: [status] crit: can't initialize service
Sep 11 03:47:18 m6 pmxcfs[23896]: [quorum] crit: quorum_finalize failed: 9
Sep 11 03:47:18 m6 pmxcfs[23896]: [confdb] crit: cmap_track_delete nodelist failed: 9
Sep 11 03:47:18 m6 pmxcfs[23896]: [confdb] crit: cmap_track_delete version failed: 9
Sep 11 03:47:18 m6 pmxcfs[23896]: [confdb] crit: cmap_finalize failed: 9
-- Boot a2a3306ecccd471187974e068087520a --
Sep 11 03:51:44 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 03:51:44 m6 kernel: ACPI Error: Aborting method \_SB.PR01._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 03:51:44 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 03:51:44 m6 kernel: ACPI Error: Aborting method \_SB.PR02._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 03:51:44 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 03:51:44 m6 kernel: ACPI Error: Aborting method \_SB.PR03._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 03:51:44 m6 kernel: usbhid 1-1:1.3: couldn't find an input interrupt endpoint
Sep 11 03:51:45 m6 kernel: Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-19-0-4.sfi (-2)
Sep 11 03:51:45 m6 kernel: Bluetooth: hci0: Failed to read MSFT supported features (-56)
Sep 11 03:51:45 m6 smartd[856]: Device: /dev/nvme0, number of Error Log entries increased from 438 to 439
Sep 11 03:51:46 m6 pmxcfs[1187]: [quorum] crit: quorum_initialize failed: 2
Sep 11 03:51:46 m6 pmxcfs[1187]: [quorum] crit: can't initialize service
Sep 11 03:51:46 m6 pmxcfs[1187]: [confdb] crit: cmap_initialize failed: 2
Sep 11 03:51:46 m6 pmxcfs[1187]: [confdb] crit: can't initialize service
Sep 11 03:51:46 m6 pmxcfs[1187]: [dcdb] crit: cpg_initialize failed: 2
Sep 11 03:51:46 m6 pmxcfs[1187]: [dcdb] crit: can't initialize service
Sep 11 03:51:46 m6 pmxcfs[1187]: [status] crit: cpg_initialize failed: 2
Sep 11 03:51:46 m6 pmxcfs[1187]: [status] crit: can't initialize service
Sep 11 04:22:37 m6 kernel: igc 0000:01:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Sep 11 04:22:37 m6 kernel: igc 0000:01:00.0:   device [8086:15f3] error status/mask=00004000/00000000
Sep 11 04:22:37 m6 kernel: igc 0000:01:00.0:    [14] CmpltTO
Sep 11 04:22:37 m6 kernel: genirq: Flags mismatch irq 127. 00000000 (enp1s0) vs. 00000000 (enp1s0)
Sep 11 04:22:37 m6 kernel: kernel BUG at drivers/pci/msi.c:369!
Sep 11 04:22:37 m6 kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Sep 11 04:22:37 m6 kernel: BUG: unable to handle page fault for address: ffffb9d68054ff08
Sep 11 04:22:37 m6 kernel: #PF: supervisor instruction fetch in kernel mode
Sep 11 04:22:37 m6 kernel: #PF: error_code(0x0011) - permissions violation
Sep 11 04:22:37 m6 kernel: Fixing recursive fault but reboot is needed!
-- Boot 71318f51408240119e7a927127a8c2f0 --
Sep 11 04:26:46 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 04:26:46 m6 kernel: ACPI Error: Aborting method \_SB.PR01._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 04:26:46 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 04:26:46 m6 kernel: ACPI Error: Aborting method \_SB.PR02._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 04:26:46 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 04:26:46 m6 kernel: ACPI Error: Aborting method \_SB.PR03._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 04:26:46 m6 kernel: usbhid 1-1:1.3: couldn't find an input interrupt endpoint
Sep 11 04:26:46 m6 kernel: Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-19-0-4.sfi (-2)
Sep 11 04:26:46 m6 kernel: Bluetooth: hci0: Failed to read MSFT supported features (-56)
Sep 11 04:26:47 m6 smartd[2097]: Device: /dev/nvme0, number of Error Log entries increased from 439 to 441
Sep 11 04:26:48 m6 pmxcfs[2411]: [quorum] crit: quorum_initialize failed: 2
Sep 11 04:26:48 m6 pmxcfs[2411]: [quorum] crit: can't initialize service
Sep 11 04:26:48 m6 pmxcfs[2411]: [confdb] crit: cmap_initialize failed: 2
Sep 11 04:26:48 m6 pmxcfs[2411]: [confdb] crit: can't initialize service
Sep 11 04:26:48 m6 pmxcfs[2411]: [dcdb] crit: cpg_initialize failed: 2
Sep 11 04:26:48 m6 pmxcfs[2411]: [dcdb] crit: can't initialize service
Sep 11 04:26:48 m6 pmxcfs[2411]: [status] crit: cpg_initialize failed: 2
Sep 11 04:26:48 m6 pmxcfs[2411]: [status] crit: can't initialize service
Sep 11 05:10:28 m6 pvedaemon[343090]: VM is locked (create)
Sep 11 05:10:28 m6 pvedaemon[2581]: <[email protected]> end task UPID:m6:00053C32:0004071D:631DA604:qmdestroy:105:[email protected]: VM is locked (create)
Sep 11 05:10:39 m6 pvedaemon[343170]: MAX 4 vcpus allowed per VM on this node
Sep 11 05:10:39 m6 pvedaemon[2582]: <[email protected]> end task UPID:m6:00053C82:00040B44:631DA60F:qmstart:106:[email protected]: MAX 4 vcpus allowed per VM on this node
Sep 11 05:12:16 m6 QEMU[343324]: kvm: terminating on signal 15 from pid 2088 (/usr/sbin/qmeventd)
Sep 11 05:12:17 m6 pvedaemon[2582]: VM 106 qmp command failed - VM 106 not running
Sep 11 05:12:55 m6 pvedaemon[2579]: VM 106 qmp command failed - VM 106 qmp command 'guest-ping' failed - got timeout
Sep 11 05:13:14 m6 pvedaemon[2581]: VM 106 qmp command failed - VM 106 qmp command 'guest-ping' failed - got timeout
Sep 11 05:13:33 m6 pvedaemon[2582]: VM 106 qmp command failed - VM 106 qmp command 'guest-ping' failed - got timeout
Sep 11 05:13:52 m6 pvedaemon[2581]: VM 106 qmp command failed - VM 106 qmp command 'guest-ping' failed - got timeout
Sep 11 05:14:12 m6 pvedaemon[2582]: VM 106 qmp command failed - VM 106 qmp command 'guest-ping' failed - got timeout
Sep 11 05:15:22 m6 qm[429710]: VM is locked (create)
Sep 11 05:15:22 m6 qm[429514]: <[email protected]> end task UPID:m6:00068E8E:000479C9:631DA72A:qmdestroy:105:[email protected]: VM is locked (create)
Sep 11 05:16:50 m6 QEMU[360846]: kvm: terminating on signal 15 from pid 2088 (/usr/sbin/qmeventd)
Sep 11 05:16:51 m6 pvedaemon[430008]: VM 106 qmp command failed - VM 106 not running
Sep 11 05:18:22 m6 pmxcfs[2411]: [confdb] crit: cmap_dispatch failed: 2
Sep 11 05:18:22 m6 pmxcfs[2411]: [quorum] crit: quorum_dispatch failed: 2
Sep 11 05:18:22 m6 pmxcfs[2411]: [dcdb] crit: cpg_dispatch failed: 2
Sep 11 05:18:22 m6 pmxcfs[2411]: [dcdb] crit: cpg_leave failed: 2
Sep 11 05:18:22 m6 pmxcfs[2411]: [status] crit: cpg_dispatch failed: 2
Sep 11 05:18:22 m6 pmxcfs[2411]: [status] crit: cpg_leave failed: 2
Sep 11 05:18:23 m6 pmxcfs[2411]: [quorum] crit: quorum_initialize failed: 2
Sep 11 05:18:23 m6 pmxcfs[2411]: [quorum] crit: can't initialize service
Sep 11 05:18:23 m6 pmxcfs[2411]: [confdb] crit: cmap_initialize failed: 2
Sep 11 05:18:23 m6 pmxcfs[2411]: [confdb] crit: can't initialize service
Sep 11 05:18:23 m6 pmxcfs[2411]: [dcdb] crit: cpg_initialize failed: 2
Sep 11 05:18:23 m6 pmxcfs[2411]: [dcdb] crit: can't initialize service
Sep 11 05:18:23 m6 pmxcfs[2411]: [status] crit: cpg_initialize failed: 2
Sep 11 05:18:23 m6 pmxcfs[2411]: [status] crit: can't initialize service
Sep 11 05:18:24 m6 pmxcfs[2411]: [quorum] crit: quorum_finalize failed: 9
Sep 11 05:18:24 m6 pmxcfs[2411]: [confdb] crit: cmap_track_delete nodelist failed: 9
Sep 11 05:18:24 m6 pmxcfs[2411]: [confdb] crit: cmap_track_delete version failed: 9
Sep 11 05:18:24 m6 pmxcfs[2411]: [confdb] crit: cmap_finalize failed: 9
Sep 11 05:18:25 m6 kernel: watchdog: watchdog0: watchdog did not stop!
-- Boot 8e3edef52e164d958138494b06d9025c --
Sep 11 05:19:02 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 05:19:02 m6 kernel: ACPI Error: Aborting method \_SB.PR01._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 05:19:02 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 05:19:02 m6 kernel: ACPI Error: Aborting method \_SB.PR02._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 05:19:02 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 05:19:02 m6 kernel: ACPI Error: Aborting method \_SB.PR03._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 05:19:02 m6 kernel: usbhid 1-1:1.3: couldn't find an input interrupt endpoint
Sep 11 05:19:02 m6 kernel: Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-19-0-4.sfi (-2)
Sep 11 05:19:02 m6 kernel: Bluetooth: hci0: Failed to read MSFT supported features (-56)
Sep 11 05:19:03 m6 smartd[1051]: Device: /dev/nvme0, number of Error Log entries increased from 441 to 443
Sep 11 05:19:04 m6 pmxcfs[1362]: [quorum] crit: quorum_initialize failed: 2
Sep 11 05:19:04 m6 pmxcfs[1362]: [quorum] crit: can't initialize service
Sep 11 05:19:04 m6 pmxcfs[1362]: [confdb] crit: cmap_initialize failed: 2
Sep 11 05:19:04 m6 pmxcfs[1362]: [confdb] crit: can't initialize service
Sep 11 05:19:04 m6 pmxcfs[1362]: [dcdb] crit: cpg_initialize failed: 2
Sep 11 05:19:04 m6 pmxcfs[1362]: [dcdb] crit: can't initialize service
Sep 11 05:19:04 m6 pmxcfs[1362]: [status] crit: cpg_initialize failed: 2
Sep 11 05:19:04 m6 pmxcfs[1362]: [status] crit: can't initialize service
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.5, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.2, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.5, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.2, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.5, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.2, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.5, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.2, no available reset mechanism.
Sep 11 05:37:22 m6 QEMU[2183]: kvm: vfio: Cannot reset device 0000:00:14.0, no available reset mechanism.
Sep 11 05:39:34 m6 kernel: igc 0000:01:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Sep 11 05:39:34 m6 kernel: igc 0000:01:00.0:   device [8086:15f3] error status/mask=00004000/00000000
Sep 11 05:39:34 m6 kernel: igc 0000:01:00.0:    [14] CmpltTO
Sep 11 05:39:35 m6 kernel: genirq: Flags mismatch irq 126. 00000000 (enp1s0) vs. 00000000 (enp1s0)
Sep 11 05:39:35 m6 kernel: kernel BUG at drivers/pci/msi.c:369!
Sep 11 05:39:35 m6 kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Sep 11 05:39:35 m6 kernel: BUG: unable to handle page fault for address: ffff9d97c054ff08
Sep 11 05:39:35 m6 kernel: #PF: supervisor instruction fetch in kernel mode
Sep 11 05:39:35 m6 kernel: #PF: error_code(0x0011) - permissions violation
Sep 11 05:39:35 m6 kernel: Fixing recursive fault but reboot is needed!
Sep 11 05:39:44 m6 pmxcfs[1362]: [dcdb] crit: received write while not quorate - trigger resync
Sep 11 05:39:44 m6 pmxcfs[1362]: [dcdb] crit: leaving CPG group
Sep 11 05:39:44 m6 pve-ha-lrm[1574]: unable to write lrm status file - unable to open file '/etc/pve/nodes/m6/lrm_status.tmp.1574' - Permission denied
Sep 11 05:39:44 m6 pmxcfs[1362]: [dcdb] crit: cpg_join failed: 14
Sep 11 05:39:44 m6 pmxcfs[1362]: [dcdb] crit: can't initialize service
Sep 11 05:39:44 m6 kernel: DMAR: DRHD: handling fault status reg 3
Sep 11 05:39:44 m6 kernel: DMAR: [DMA Write NO_PASID] Request device [01:00.0] fault addr 0xfffa8000 [fault reason 0x05] PTE Write access is not set
Sep 11 05:39:44 m6 kernel: DMAR: DRHD: handling fault status reg 2
Sep 11 05:39:44 m6 kernel: DMAR: [DMA Write NO_PASID] Request device [01:00.0] fault addr 0xffcf5000 [fault reason 0x05] PTE Write access is not set
Sep 11 05:39:44 m6 kernel: DMAR: DRHD: handling fault status reg 3
Sep 11 05:39:44 m6 kernel: DMAR: [DMA Write NO_PASID] Request device [01:00.0] fault addr 0xffcf6000 [fault reason 0x05] PTE Write access is not set
Sep 11 05:39:45 m6 kernel: DMAR: DRHD: handling fault status reg 3
Sep 11 05:40:09 m6 pvescheduler[116984]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:40:09 m6 pvescheduler[116983]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Sep 11 05:41:09 m6 pvescheduler[117673]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:41:09 m6 pvescheduler[117672]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Sep 11 05:42:09 m6 pvescheduler[118396]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:42:09 m6 pvescheduler[118395]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Sep 11 05:43:09 m6 pvescheduler[119096]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:43:09 m6 pvescheduler[119095]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Sep 11 05:44:09 m6 pvescheduler[122756]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:44:09 m6 pvescheduler[122755]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Sep 11 05:45:09 m6 pvescheduler[123164]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:45:09 m6 pvescheduler[123163]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Sep 11 05:46:09 m6 pvescheduler[123630]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:46:09 m6 pvescheduler[123629]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Sep 11 05:47:09 m6 pvescheduler[124030]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:47:09 m6 pvescheduler[124029]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Sep 11 05:48:09 m6 pvescheduler[126045]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:48:09 m6 pvescheduler[126044]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Sep 11 05:49:09 m6 pvescheduler[131736]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:49:09 m6 pvescheduler[131735]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
Sep 11 05:50:09 m6 pvescheduler[137067]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Sep 11 05:50:09 m6 pvescheduler[137066]: replication: cfs-lock 'file-replication_cfg' error: no quorum!
-- Boot 92c26d32e36e414286ac058b0017671b --
Sep 11 05:51:32 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 05:51:32 m6 kernel: ACPI Error: Aborting method \_SB.PR01._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 05:51:32 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 05:51:32 m6 kernel: ACPI Error: Aborting method \_SB.PR02._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 05:51:32 m6 kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PR00._CPC], AE_NOT_FOUND (20210730/psargs-330)
Sep 11 05:51:32 m6 kernel: ACPI Error: Aborting method \_SB.PR03._CPC due to previous error (AE_NOT_FOUND) (20210730/psparse-529)
Sep 11 05:51:32 m6 kernel: usbhid 1-1:1.3: couldn't find an input interrupt endpoint
Sep 11 05:51:32 m6 kernel: Bluetooth: hci0: Failed to load Intel firmware file intel/ibt-19-0-4.sfi (-2)
Sep 11 05:51:32 m6 kernel: Bluetooth: hci0: Failed to read MSFT supported features (-56)
Sep 11 05:51:33 m6 smartd[1237]: Device: /dev/nvme0, number of Error Log entries increased from 443 to 445
Sep 11 05:51:34 m6 pmxcfs[1554]: [quorum] crit: quorum_initialize failed: 2
Sep 11 05:51:34 m6 pmxcfs[1554]: [quorum] crit: can't initialize service
Sep 11 05:51:34 m6 pmxcfs[1554]: [confdb] crit: cmap_initialize failed: 2
Sep 11 05:51:34 m6 pmxcfs[1554]: [confdb] crit: can't initialize service
Sep 11 05:51:34 m6 pmxcfs[1554]: [dcdb] crit: cpg_initialize failed: 2
Sep 11 05:51:34 m6 pmxcfs[1554]: [dcdb] crit: can't initialize service
Sep 11 05:51:34 m6 pmxcfs[1554]: [status] crit: cpg_initialize failed: 2
Sep 11 05:51:34 m6 pmxcfs[1554]: [status] crit: can't initialize service

This seems to be smoking gun? I am not sure what to do next

Sep 11 04:22:37 m6 kernel: igc 0000:01:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID)
Sep 11 04:22:37 m6 kernel: igc 0000:01:00.0:   device [8086:15f3] error status/mask=00004000/00000000
Sep 11 04:22:37 m6 kernel: igc 0000:01:00.0:    [14] CmpltTO
Sep 11 04:22:37 m6 kernel: genirq: Flags mismatch irq 127. 00000000 (enp1s0) vs. 00000000 (enp1s0)
Sep 11 04:22:37 m6 kernel: kernel BUG at drivers/pci/msi.c:369!
Sep 11 04:22:37 m6 kernel: kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
Sep 11 04:22:37 m6 kernel: BUG: unable to handle page fault for address: ffffb9d68054ff08
Sep 11 04:22:37 m6 kernel: #PF: supervisor instruction fetch in kernel mode
Sep 11 04:22:37 m6 kernel: #PF: error_code(0x0011) - permissions violation
Sep 11 04:22:37 m6 kernel: Fixing recursive fault but reboot is needed!

Huh, a quick googling says it has something to do with IOMMU, start looking there perhaps?

interesting, I found something related to the kernel’s CPPC sysfs module: Ubuntu 22.04 "ACPI BIOS Error (bug): Could not resolve symbol" errors on Asus X705UA - Ask Ubuntu

I am trying out a 5.19.x kernel now to see if the device hangs: GitHub - fabianishere/pve-edge-kernel: Newer Linux kernels for Proxmox VE

[email protected]:~# uname -r
5.19.8-edge

will try IOMMU next if it does crash with this 5.19 kernel

Google suggests pci=noaer and possibly pcie_aspm=off

It also seems like your NVME drive is behaving strangely, you might also want to boot regular distro to remove proxmox from the equation

Just wanted to report that ever since installing the newer kernel 5.19 my system has been stable and working just fine. I didn’t have to change any pcie* /proc/cmdline boot settings.

It seems that https://bugzilla.kernel.org/show_bug.cgi?id=213023 was the issue as the Ubuntu link I shared earlier.

Thanks everyone for the help on this.

2 Likes

I don’t believe any kernel update could solve your original issue in the screenshot. I think it’s either bad memory, improper installation of DIMMs, bad IMC or a combination of the above.

This is one of those Aliexpress mini PCs that have soldered DDR4 ram on it.

But happy its been stable and running a Windows 10 VM without issues in proxmox (with Wifi 6 passthru and everything)

Doesn’t matter soldered or DIMMs if it’s bad memory or poor contacts/soldering. If you haven’t done so, I would thoroughly stress test the memory subsystem of the hardware.

Let me run memtest86 to check it.

Which string of the error message I shared is the tell-tale sign its a memory issue?

looks like your missing drivers.
did you install your image iso or dd mode.
if dd mode was selected then you would need 3rd party linux drivers.

try writing your image in iso mode and installing again. that way most of your hardware will be auto detected and the generic drivers installed for it.(with luck).
then once your logged in run and update/upgrade and see if more drivers get installed.

https://www.cyberciti.biz/faq/howto-display-list-of-modules-or-device-drivers-in-the-linux-kernel/

this page seems to be easy to follow.

further down there’s a section on how to see what drivers are installed and diagnose errors.