Hello there,
Some years back, I had built a ryzen 1700 server, and was suffering stability problems. Back then I was running Ubuntu 16.04 and managed to solve it by compiling a custom kernel with the community’s help. I went on to use the help to create this blog post at the time and all was well.
Years later I’m having to deal with this server again (same hardware). This time it’s running Proxmox 7.4 on a 5.15 kernel. I’m experiencing the lockups/freezes again, so I tried just setting the boot parameters in /etc/default/grub with: rcu_nocbs=0-15
before rebooting:
I would have thought that this many years later, I would not have to re-compile the kernel to support this. I can see that it is getting executed when I reboot, by running dmesg
…
…but is there a way to check that this has actually taken effect? Is there any test I can run? I ask because since having applied that change, the system became unresponsive again overnight.
It sounds like the CONFIG_RCU_NOCB_RCU
option is enabled by default, but it is an option, so one would need to check if the kernel was compiled with it disabled.
It appears that one should hopefully be able to see the options the kernel was compiled with in one of the following areas:
/proc/config.gz
/boot/config
/boot/config-$(uname -r)
as metnioned in this post.
In ubuntu 22.04 I found the /boot/config-6.2.0-33-generic
file, but when I grep for RCU
like as follows:
cat /boot/config-6.2.0-33-generic | grep RCU
… then I dont see any option along the lines of CONFIG_RCU_NOCB_CPU
and just get the following:
# RCU Subsystem
CONFIG_TREE_RCU=y
CONFIG_PREEMPT_RCU=y
# CONFIG_RCU_EXPERT is not set
CONFIG_SRCU=y
CONFIG_TREE_SRCU=y
CONFIG_TASKS_RCU_GENERIC=y
CONFIG_TASKS_RCU=y
CONFIG_TASKS_RUDE_RCU=y
CONFIG_TASKS_TRACE_RCU=y
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_NEED_SEGCBLIST=y
# end of RCU Subsystem
CONFIG_MMU_GATHER_RCU_TABLE_FREE=y
# RCU Debugging
# CONFIG_RCU_SCALE_TEST is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_REF_SCALE_TEST is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
# CONFIG_RCU_TRACE is not set
# CONFIG_RCU_EQS_DEBUG is not set
# end of RCU Debugging
Sice it has a bunch of lines saying xyz is not set, it makes me think that every option should be listed, even if it is not set. Does anyone know if this is not the case, and since I don’t see it, it should be enabled by default?
1 Like