Kernel watchdog BUG soft lockup : proxmox host : ubuntu vm

I’m having issues with a Proxmox VM - running Ubuntu 20.04 LTS guest - having cpu soft locks whilst trying to compile Android 14.0 AOSP from source.

Platform:
pve-manager/8.0.4/ (kernel: 6.2.16-19-pve)
Tyan S8040
Epyc Bergamo - 112 Cores
288 GB RAM

VM with 96 GB RAM and I’ve tried with 128 and 64 virtual cpus.

If I build with -j64 I see many soft lock issues.
Just tried it with -j16, and it seems to not have any (so far).

Example log output:

Message from syslogd@ubuntu-2004-android-build at Apr  5 02:46:41 ...
 kernel:[  306.041804] watchdog: BUG: soft lockup - CPU#62 stuck for 54s! [soong_build:7430]

Message from syslogd@ubuntu-2004-android-build at Apr  5 02:46:41 ...
 kernel:[  306.041808] watchdog: BUG: soft lockup - CPU#0 stuck for 50s! [kworker/u128:2:534]

Message from syslogd@ubuntu-2004-android-build at Apr  5 02:46:41 ...
 kernel:[  306.041818] watchdog: BUG: soft lockup - CPU#4 stuck for 24s! [kworker/4:1:415]

Is your firmware up to date?

Strange problem, it’s like the proxmox kernel is spending 30 seconds waiting for … something. One cpu so it can’t be a NUMA issue. Maybe page tables if huge pages aren’t working properly?

https://www.brendangregg.com/Articles/Netflix_Linux_Perf_Analysis_60s.pdf You’ll need to pull out some linux tools to figure this one out.

1 Like

Thanks, for the article. I just looked briefly and that looks like a very useful resource.

The firmware was up-to-date when I check ed a few weeks ago. I’ll check again.

It could be NUMA, so my first thing to try is pinning the cores in Proxmox for the VM.