you know I was working on this myself and I found when I suspend then resume the hung machine it recovers, which I thought was interesting.
I didn’t even think of that. Mostly because suspend always broke something for me. So the first thing I do is just disable that. But it does make sense of course, the system was “suspended” anyway and kicks back into gear when you resume. … Damn, should have thought of that.
Oh also, on my machine no input was working when the freeze happened.
How did you suspend it? SSH?
power button is setup to suspend
Huh, powerbutton had no effect on my machine, was setup to power off.
What machine was that? I’m on the Lenovo Ideapad 720s.
you might have to configure it on your distro… this is umm raw motheroardish.
Yeah, like I said, that did not work for me. It was configured to power off without any dialog or confirmation and it did nothing. Must be device specific then.
I actually tried that before. Didn’t work with my Envy x360 at least. Power button is indeed set to sleep. If I remove the rcu and cstate code it sleeps, wakes up, and even hibernates with no issue, but crashes randomly. If the code is in there I lose power management and sleep but at least the laptop works. So far everything is looking well with rcu_nocbs and cstate max 1.
Moving from an Acer Aspire S7 (great laptop but dual core haswell and 8gb ram) to this has been heavenly. I’m hoping for a fix for this issue soon but that’s all.
Linux 4.17 is now mainline.
I doubt it’s gonna fix the issue since non of the rc kernels did, but I am compiling as we speak and will test and update everyone.
EDIT: It crashed as expected.
Today’s my birthday so I’m gonna have some fun instead of testing lol.
But when I’m done I wanna see which C-State is causing the issue by going from max_cstate=1 to 6. 6 will probably fail but I do wanna test out. Will update as soon as I find anything.
The forum is not letting me post because I am a new user
I can only edit.
Note that for the rcu_nocbs line to work you need to enable the option during kernel compile-time.
Look here for reference.
Arch Kernels have it enabled by default AFAIK
I tested with processor.max_cstate as 5 down to 1. The only stable one is 1, any other value and it crashes.
Add “rcu_nocbs=0-7” and “processor.max_cstate=1” in grub.
For me it doesn’t work =( Before it used to hang only under high loads, now it hangs randomly by web-surfing too.
What is your hardware and software specs exactly?
Ryzen 3 2200G on MSI B350 Motherboard
Ubuntu 18.04 (Kernel 14.17.0)
dmesg give this:
[ 3240.929502] pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
[ 3240.929505] pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
[ 3240.929507] pcieport 0000:00:01.2:  Replay Timer Timeout
The system loads approximately once from 4 times.
Errors by loading:
[ 0.079553] ACPI BIOS Error (bug): Failure creating [_SB.SMIC], AE_ALREADY_EXISTS (20180313/dswload2-316)
[ 0.079764] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20180313/psobject-220)
[ 0.079946] ACPI Error: Method parse/execution failed , AE_ALREADY_EXISTS (20180313/psparse-516)
[ 0.080001] ACPI Error: Invalid zero thread count in method (20180313/dsmethod-760)
[ 0.080174] ACPI: Marking method ___ as Serialized because of AE_ALREADY_EXISTS error
[ 0.080175] ACPI Error: Invalid OwnerId: 0x00 (20180313/utownerid-156)
[ 0.080350] ACPI Error: AE_ALREADY_EXISTS, (SSDT: AMD PT) while loading table (20180313/tbxfload-197)
[ 0.081148] ACPI Error: 1 table load failures, 7 successful (20180313/tbxfload-215)
[ 0.697188] AMD-Vi: Unable to write to IOMMU perf counter.
[ 4.100036] usb 1-8: device not accepting address 5, error -71
[ 4.100139] usb usb1-port8: unable to enumerate USB device
[ 16.801249] kvm: disabled by bios
Thanks for any help
Huh, so this also happens on the desktop APUs? Didn’t realize that.
I am guessing you are on the latest UEFI?
Can’t you just disable C-states in the BIOS?
Oh and since the 2200G has only 4 threads it must be rcu_nocbs=0-3
Yes, i have latest UEFI and i tried already rcu_nocbs=0-3 - it remains the same.
I’ll try to disable C-states in the BIOS. Thanks
So I installed Fedora and am testing kernel 4.18rc1.
It seems the issue has not been fixed yet. I will start testing with what kernel parameters are needed for this.
In the meantime, 4.17 works perfectly with the two parameters I posted two weeks ago.
Thank you guys for investigating this issue when AMD doesnt.
I recently bought an Acer Swift 3 with Ryzen 5 2500U and installed Linux Mint 19 on it.
Using latest Kernel 4.17 and Mesa version but Im still having the same freeze issues when watching youtube videos. I will try the suggested workaround and report on it.
I was able to work on my notebook this evening without a single freeze. It seems, for my case just adding
/etc/default/grub was enough:
Afterwards I ran
sudo update-grub rebooted and no freezes yet!
For anyone interested in my system information:
Yeah, at this point it is obvious that it is a C-State thing again.
Glad to hear that the
processor.max_cstate=1 seems to work across the board.
Ryzen 7 1700 machine check error - instant reboot
I have the same machine and I hope this will help me.
At the same time I am wondering, how did you manage to install LM to it? Mine wouldn’t play ball at all with LM. It didn’t even get into the desk environment from the USB.
I am currently running Antergos Cinnamon and liking it a lot.
Thanks for the solution, will let know if my system holds.
So under Kubuntu 18.04 my machine is actually freezing up again.
I can ssh into it and dmesg tells me this:
[ 42.821380] amdgpu: [powerplay] pp_dpm_get_temperature was not implemented. [ 5488.951160] gmc_v9_0_process_interrupt: 21 callbacks suppressed [ 5488.951167] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:153 vm_id:0 pas_id:0) [ 5488.951175] amdgpu 0000:03:00.0: at page 0x0000000600000000 from 18 [ 5488.951178] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000132 [ 5680.091606] INFO: task Xorg:991 blocked for more than 120 seconds. [ 5680.091612] Tainted: G W 4.15.0-24-generic #26-Ubuntu [ 5680.091615] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5680.091618] Xorg D 0 991 985 0x00400004 [ 5680.091622] Call Trace: [ 5680.091632] __schedule+0x291/0x8a0 [ 5680.091636] schedule+0x2c/0x80 [ 5680.091639] schedule_preempt_disabled+0xe/0x10 [ 5680.091641] __ww_mutex_lock.isra.3+0x204/0x670 [ 5680.091645] __ww_mutex_lock_slowpath+0x16/0x20 [ 5680.091647] ? __ww_mutex_lock_slowpath+0x16/0x20 [ 5680.091649] ww_mutex_lock+0x5a/0x70 [ 5680.091671] drm_modeset_backoff+0x47/0xc0 [drm] [ 5680.091687] drm_mode_obj_set_property_ioctl+0x14b/0x280 [drm] [ 5680.091704] ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm] [ 5680.091719] drm_mode_connector_property_set_ioctl+0x3f/0x60 [drm] [ 5680.091731] drm_ioctl_kernel+0x5f/0xb0 [drm] [ 5680.091743] drm_ioctl+0x31b/0x3d0 [drm] [ 5680.091757] ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm] [ 5680.091798] amdgpu_drm_ioctl+0x4f/0x90 [amdgpu] [ 5680.091803] do_vfs_ioctl+0xa8/0x630 [ 5680.091807] ? vfs_read+0x115/0x130 [ 5680.091809] SyS_ioctl+0x79/0x90 [ 5680.091813] do_syscall_64+0x73/0x130 [ 5680.091816] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 5680.091819] RIP: 0033:0x7f4fa03325d7 [ 5680.091821] RSP: 002b:00007ffe84240258 EFLAGS: 00003246 ORIG_RAX: 0000000000000010 [ 5680.091823] RAX: ffffffffffffffda RBX: 000055a8ab08d6f0 RCX: 00007f4fa03325d7 [ 5680.091824] RDX: 00007ffe84240290 RSI: 00000000c01064ab RDI: 0000000000000017 [ 5680.091826] RBP: 00007ffe84240290 R08: 0000000000000001 R09: 0000000000000000 [ 5680.091827] R10: 00007f4fa03bacc0 R11: 0000000000003246 R12: 00000000c01064ab [ 5680.091828] R13: 0000000000000017 R14: 000055a8ab08db10 R15: 000055a8a9c88601 [ 5680.091913] INFO: task kworker/u32:4:5622 blocked for more than 120 seconds. [ 5680.091915] Tainted: G W 4.15.0-24-generic #26-Ubuntu [ 5680.091917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 5680.091919] kworker/u32:4 D 0 5622 2 0x80000000 [ 5680.091933] Workqueue: events_unbound commit_work [drm_kms_helper] [ 5680.091934] Call Trace: [ 5680.091937] __schedule+0x291/0x8a0 [ 5680.091941] schedule+0x2c/0x80 [ 5680.091943] schedule_timeout+0x1cf/0x350 [ 5680.092003] ? tgn10_get_crtc_scanoutpos+0x6b/0xa0 [amdgpu] [ 5680.092007] dma_fence_default_wait+0x1c7/0x260 [ 5680.092009] ? dma_fence_release+0xa0/0xa0 [ 5680.092011] dma_fence_wait_timeout+0x3e/0xf0 [ 5680.092014] reservation_object_wait_timeout_rcu+0x17d/0x370 [ 5680.092072] amdgpu_dm_do_flip+0x12c/0x390 [amdgpu] [ 5680.092126] amdgpu_dm_atomic_commit_tail+0x92c/0xa50 [amdgpu] [ 5680.092131] ? dequeue_entity+0xe4/0x470 [ 5680.092135] ? __switch_to+0x182/0x500 [ 5680.092143] commit_tail+0x42/0x70 [drm_kms_helper] [ 5680.092149] commit_work+0x12/0x20 [drm_kms_helper] [ 5680.092153] process_one_work+0x1de/0x410 [ 5680.092155] worker_thread+0x32/0x410 [ 5680.092158] kthread+0x121/0x140 [ 5680.092160] ? process_one_work+0x410/0x410 [ 5680.092163] ? kthread_create_worker_on_cpu+0x70/0x70 [ 5680.092165] ? do_syscall_64+0x73/0x130 [ 5680.092168] ? SyS_exit+0x17/0x20 [ 5680.092170] ret_from_fork+0x22/0x40