GIGABYTE GA-AB350-GAMING 3 and Ryzen 1800x

I have been fighting an issue with my system for quite some time… I get CPU Soft Lockups and am about to give up… The box has the default Server install, no desktop and nothing else running, not even a web site…

My System:

GIGABYTE GA-AB350-GAMING-3
Ryzen 1800x
Vengence 64GB Ram
Samsung EVO 960 Pro 1TB M.2
Corsair 650 Gold Power supply
MSI GTX 1050ti Video Card

I am running Ubuntu 17.10, tried standard kernels up to the very latest… Nothing is OC, all standard speeds… The system runs fine, but after 2 weeks I get CPU Soft Lockups… BIOS is up to date… When I reboot, the system works fine for 2 weeks… I tried setting up a cron job to reboot the system every morning at 3:00AM but it still freezes after a couple weeks.

Syslog:

Feb 22 17:09:06 srv1 kernel: [50911.454881] watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [kworker/5:1:4253]
Feb 22 17:09:06 srv1 kernel: [50911.454945] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack br_netfilter bridge stp llc aufs snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic edac_mce_amd kvm snd_hda_intel irqbypass snd_hda_codec wmi_bmof snd_hda_core snd_hwdep snd_pcm i2c_piix4 snd_timer ccp snd soundcore shpchp tpm_infineon 8250_dw mac_hid ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear nouveau mxm_wmi video crct10dif_pclmul i2c_algo_bit crc32_pclmul
Feb 22 17:09:06 srv1 kernel: [50911.454982]  ttm ghash_clmulni_intel pcbc drm_kms_helper syscopyarea sysfillrect sysimgblt aesni_intel fb_sys_fops aes_x86_64 crypto_simd glue_helper cryptd drm r8169 nvme ahci mii libahci nvme_core wmi gpio_amdpt gpio_generic
Feb 22 17:09:06 srv1 kernel: [50911.454994] CPU: 5 PID: 4253 Comm: kworker/5:1 Not tainted 4.13.0-32-generic #35-Ubuntu
Feb 22 17:09:06 srv1 kernel: [50911.454995] Hardware name: Gigabyte Technology Co., Ltd. AB350-Gaming 3/AB350-Gaming 3-CF, BIOS F7 06/16/2017
Feb 22 17:09:06 srv1 kernel: [50911.455000] Workqueue: events netstamp_clear
Feb 22 17:09:06 srv1 kernel: [50911.455002] task: ffff934bca600000 task.stack: ffffa624916c8000
Feb 22 17:09:06 srv1 kernel: [50911.455005] RIP: 0010:smp_call_function_many+0x24a/0x270
Feb 22 17:09:06 srv1 kernel: [50911.455006] RSP: 0018:ffffa624916cbcc0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff10
Feb 22 17:09:06 srv1 kernel: [50911.455008] RAX: ffff934c1ee272d0 RBX: ffff934c1ed63a00 RCX: 0000000000000008
Feb 22 17:09:06 srv1 kernel: [50911.455009] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff934c1e828128
Feb 22 17:09:06 srv1 kernel: [50911.455010] RBP: ffffa624916cbcf8 R08: ffffffffffffff00 R09: 000000000000ffdf
Feb 22 17:09:06 srv1 kernel: [50911.455011] R10: ffffd1f47f5e5b80 R11: 0000000000000000 R12: 0000000000000010
Feb 22 17:09:06 srv1 kernel: [50911.455011] R13: 0000000000000010 R14: ffffffffb6e33210 R15: 0000000000000000
Feb 22 17:09:06 srv1 kernel: [50911.455013] FS:  0000000000000000(0000) GS:ffff934c1ed40000(0000) knlGS:0000000000000000
Feb 22 17:09:06 srv1 kernel: [50911.455014] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Feb 22 17:09:06 srv1 kernel: [50911.455014] CR2: 00007ff67c010b88 CR3: 0000000fd6002000 CR4: 00000000003406e0
Feb 22 17:09:06 srv1 kernel: [50911.455015] Call Trace:
Feb 22 17:09:06 srv1 kernel: [50911.455020]  ? netif_receive_skb_internal+0x28/0x3f0
Feb 22 17:09:06 srv1 kernel: [50911.455022]  ? arch_unregister_cpu+0x30/0x30
Feb 22 17:09:06 srv1 kernel: [50911.455024]  ? netif_receive_skb_internal+0x29/0x3f0
Feb 22 17:09:06 srv1 kernel: [50911.455025]  on_each_cpu+0x2d/0x60
Feb 22 17:09:06 srv1 kernel: [50911.455027]  ? netif_receive_skb_internal+0x28/0x3f0
Feb 22 17:09:06 srv1 kernel: [50911.455028]  text_poke_bp+0x6a/0xf0
Feb 22 17:09:06 srv1 kernel: [50911.455030]  __jump_label_transform.isra.0+0x10b/0x120
Feb 22 17:09:06 srv1 kernel: [50911.455032]  arch_jump_label_transform+0x32/0x50
Feb 22 17:09:06 srv1 kernel: [50911.455034]  __jump_label_update+0x68/0x80
Feb 22 17:09:06 srv1 kernel: [50911.455036]  jump_label_update+0xae/0xc0
Feb 22 17:09:06 srv1 kernel: [50911.455038]  __static_key_slow_dec+0x4e/0xa0
Feb 22 17:09:06 srv1 kernel: [50911.455040]  static_key_slow_dec+0x22/0x50
Feb 22 17:09:06 srv1 kernel: [50911.455041]  static_key_disable+0x21/0x30
Feb 22 17:09:06 srv1 kernel: [50911.455042]  netstamp_clear+0x34/0x40
Feb 22 17:09:06 srv1 kernel: [50911.455045]  process_one_work+0x1e7/0x410
Feb 22 17:09:06 srv1 kernel: [50911.455047]  worker_thread+0x4b/0x420
Feb 22 17:09:06 srv1 kernel: [50911.455049]  kthread+0x125/0x140
Feb 22 17:09:06 srv1 kernel: [50911.455050]  ? process_one_work+0x410/0x410
Feb 22 17:09:06 srv1 kernel: [50911.455052]  ? kthread_create_on_node+0x70/0x70
Feb 22 17:09:06 srv1 kernel: [50911.455054]  ret_from_fork+0x1f/0x30
Feb 22 17:09:06 srv1 kernel: [50911.455056] Code: 77 35 00 39 05 bc ad 34 01 89 c1 0f 8e 3d fe ff ff 48 98 48 8b 13 48 03 14 c5 e0 e3 f6 b7 48 89 d0 8b 52 18 83 e2 01 74 0a f3 90 <8b> 50 18 83 e2 01 75 f6 eb b9 48 c7 c2 20 08 26 b8 4c 89 e6 89 

Has anyone seen this issue before?

Well, that distro only has 3 more months of support for it. Typically on a production system you’d go for the ‘stable’ releases.

Although, the LTS is only up to kernel 4.10. Ryzen is still buggy on pretty much anything less than the current the mainline. I suppose this is a test server? TBH, still a bad choice for server hardware at the moment.

99% of issues that people have been having on Ryzen have been do to the memory. I would suggest testing that with memtest86 before proceeding. If you still get lockups after that and the memory is good then check for a bios update, and if still further then you need to hop onto the latest stable kernel ( at this moment that 4.15.15 ).


I have seen soft locks occur due to systems power state also messing up. You can also try forcing a constant voltage instead of it being auto to the vcore. I’m not sure what a good voltage to keep it locked at would be though.

I edited your post to make the log more readable.

You can place large text blocks inside 3 back ticks to format them as code blocks or any other large text chunks really.

Like this

``` text ```

Regarding the issue itself, I would like to ask you to run this tool with the -l argument and see if C6 state is enabled.


## Get the code
git clone https://github.com/r4m0n/ZenStates-Linux.git
cd ZenStates-Linux

## Load the needed modules
sudo modprobe msr cpuid

## Read out the CPU MSR (Models Specific Registers) relating to C and P states
sudo zenstates.py -l

If C6 states are enabled, I can recommend disabling the C6 processor state and seeing if you encounter the issue again. You can also check through your BIOS to see about making the change permanent with a BIOS setting if disabling C6 state helps.

Some users also report that adding the following to the kernel boot parameters has helped the issue:

rcu_nocbs=0-15 

https://bugzilla.kernel.org/show_bug.cgi?id=196683

https://forum.level1techs.com/t/solved-linux-is-unstable-ever-since-i-upgraded-to-ryzen/117541/184

If you have overclocked your processor or done other changes to it, also make sure you or your motherboard are not accidentally undervolting the CPU and be sure that no excessive voltage drop is occurring under load.

Most CPU’s can allow a voltage drop as low as 1.232V under load but this may vary.