Ryzen/Vega laptop PCIe Bus Error

Ryzen 3 2200G on MSI B350 Motherboard
Ubuntu 18.04 (Kernel 14.17.0)

dmesg give this:
[ 3240.929502] pcieport 0000:00:01.2: PCIe Bus Error: severity=Corrected, type=Data Link Layer, id=000a(Transmitter ID)
[ 3240.929505] pcieport 0000:00:01.2: device [1022:15d3] error status/mask=00001000/00006000
[ 3240.929507] pcieport 0000:00:01.2: [12] Replay Timer Timeout

The system loads approximately once from 4 times.
Errors by loading:
[ 0.079553] ACPI BIOS Error (bug): Failure creating [_SB.SMIC], AE_ALREADY_EXISTS (20180313/dswload2-316)
[ 0.079764] ACPI Error: AE_ALREADY_EXISTS, During name lookup/catalog (20180313/psobject-220)
[ 0.079946] ACPI Error: Method parse/execution failed , AE_ALREADY_EXISTS (20180313/psparse-516)
[ 0.080001] ACPI Error: Invalid zero thread count in method (20180313/dsmethod-760)
[ 0.080174] ACPI: Marking method ___ as Serialized because of AE_ALREADY_EXISTS error
[ 0.080175] ACPI Error: Invalid OwnerId: 0x00 (20180313/utownerid-156)
[ 0.080350] ACPI Error: AE_ALREADY_EXISTS, (SSDT: AMD PT) while loading table (20180313/tbxfload-197)
[ 0.081148] ACPI Error: 1 table load failures, 7 successful (20180313/tbxfload-215)
[ 0.697188] AMD-Vi: Unable to write to IOMMU perf counter.
[ 4.100036] usb 1-8: device not accepting address 5, error -71
[ 4.100139] usb usb1-port8: unable to enumerate USB device
[ 16.801249] kvm: disabled by bios

Thanks for any help

Huh, so this also happens on the desktop APUs? Didn’t realize that.
I am guessing you are on the latest UEFI?

Can’t you just disable C-states in the BIOS?

Oh and since the 2200G has only 4 threads it must be rcu_nocbs=0-3

Yes, i have latest UEFI and i tried already rcu_nocbs=0-3 - it remains the same.

I’ll try to disable C-states in the BIOS. Thanks

So I installed Fedora and am testing kernel 4.18rc1.
It seems the issue has not been fixed yet. I will start testing with what kernel parameters are needed for this.
In the meantime, 4.17 works perfectly with the two parameters I posted two weeks ago.

1 Like

Thank you guys for investigating this issue when AMD doesnt.
I recently bought an Acer Swift 3 with Ryzen 5 2500U and installed Linux Mint 19 on it.
Using latest Kernel 4.17 and Mesa version but Im still having the same freeze issues when watching youtube videos. I will try the suggested workaround and report on it.

3 Likes

I was able to work on my notebook this evening without a single freeze. It seems, for my case just adding processor.max_cstate=1 in /etc/default/grub was enough:

Afterwards I ran sudo update-grub rebooted and no freezes yet!

For anyone interested in my system information:

4 Likes

Yeah, at this point it is obvious that it is a C-State thing again.
Glad to hear that the processor.max_cstate=1 seems to work across the board.

2 Likes

I have the same machine and I hope this will help me.

At the same time I am wondering, how did you manage to install LM to it? Mine wouldn’t play ball at all with LM. It didn’t even get into the desk environment from the USB.

I am currently running Antergos Cinnamon and liking it a lot.

Thanks for the solution, will let know if my system holds.

So under Kubuntu 18.04 my machine is actually freezing up again.
I can ssh into it and dmesg tells me this:

[   42.821380] amdgpu: [powerplay] pp_dpm_get_temperature was not implemented.
[ 5488.951160] gmc_v9_0_process_interrupt: 21 callbacks suppressed
[ 5488.951167] amdgpu 0000:03:00.0: [mmhub] VMC page fault (src_id:0 ring:153 vm_id:0 pas_id:0)
[ 5488.951175] amdgpu 0000:03:00.0:   at page 0x0000000600000000 from 18
[ 5488.951178] amdgpu 0000:03:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000132
[ 5680.091606] INFO: task Xorg:991 blocked for more than 120 seconds.
[ 5680.091612]       Tainted: G        W        4.15.0-24-generic #26-Ubuntu
[ 5680.091615] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5680.091618] Xorg            D    0   991    985 0x00400004
[ 5680.091622] Call Trace:
[ 5680.091632]  __schedule+0x291/0x8a0
[ 5680.091636]  schedule+0x2c/0x80
[ 5680.091639]  schedule_preempt_disabled+0xe/0x10
[ 5680.091641]  __ww_mutex_lock.isra.3+0x204/0x670
[ 5680.091645]  __ww_mutex_lock_slowpath+0x16/0x20
[ 5680.091647]  ? __ww_mutex_lock_slowpath+0x16/0x20
[ 5680.091649]  ww_mutex_lock+0x5a/0x70
[ 5680.091671]  drm_modeset_backoff+0x47/0xc0 [drm]
[ 5680.091687]  drm_mode_obj_set_property_ioctl+0x14b/0x280 [drm]
[ 5680.091704]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 5680.091719]  drm_mode_connector_property_set_ioctl+0x3f/0x60 [drm]
[ 5680.091731]  drm_ioctl_kernel+0x5f/0xb0 [drm]
[ 5680.091743]  drm_ioctl+0x31b/0x3d0 [drm]
[ 5680.091757]  ? drm_mode_connector_set_obj_prop+0x80/0x80 [drm]
[ 5680.091798]  amdgpu_drm_ioctl+0x4f/0x90 [amdgpu]
[ 5680.091803]  do_vfs_ioctl+0xa8/0x630
[ 5680.091807]  ? vfs_read+0x115/0x130
[ 5680.091809]  SyS_ioctl+0x79/0x90
[ 5680.091813]  do_syscall_64+0x73/0x130
[ 5680.091816]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 5680.091819] RIP: 0033:0x7f4fa03325d7
[ 5680.091821] RSP: 002b:00007ffe84240258 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[ 5680.091823] RAX: ffffffffffffffda RBX: 000055a8ab08d6f0 RCX: 00007f4fa03325d7
[ 5680.091824] RDX: 00007ffe84240290 RSI: 00000000c01064ab RDI: 0000000000000017
[ 5680.091826] RBP: 00007ffe84240290 R08: 0000000000000001 R09: 0000000000000000
[ 5680.091827] R10: 00007f4fa03bacc0 R11: 0000000000003246 R12: 00000000c01064ab
[ 5680.091828] R13: 0000000000000017 R14: 000055a8ab08db10 R15: 000055a8a9c88601
[ 5680.091913] INFO: task kworker/u32:4:5622 blocked for more than 120 seconds.
[ 5680.091915]       Tainted: G        W        4.15.0-24-generic #26-Ubuntu
[ 5680.091917] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 5680.091919] kworker/u32:4   D    0  5622      2 0x80000000
[ 5680.091933] Workqueue: events_unbound commit_work [drm_kms_helper]
[ 5680.091934] Call Trace:
[ 5680.091937]  __schedule+0x291/0x8a0
[ 5680.091941]  schedule+0x2c/0x80
[ 5680.091943]  schedule_timeout+0x1cf/0x350
[ 5680.092003]  ? tgn10_get_crtc_scanoutpos+0x6b/0xa0 [amdgpu]
[ 5680.092007]  dma_fence_default_wait+0x1c7/0x260
[ 5680.092009]  ? dma_fence_release+0xa0/0xa0
[ 5680.092011]  dma_fence_wait_timeout+0x3e/0xf0
[ 5680.092014]  reservation_object_wait_timeout_rcu+0x17d/0x370
[ 5680.092072]  amdgpu_dm_do_flip+0x12c/0x390 [amdgpu]
[ 5680.092126]  amdgpu_dm_atomic_commit_tail+0x92c/0xa50 [amdgpu]
[ 5680.092131]  ? dequeue_entity+0xe4/0x470
[ 5680.092135]  ? __switch_to+0x182/0x500
[ 5680.092143]  commit_tail+0x42/0x70 [drm_kms_helper]
[ 5680.092149]  commit_work+0x12/0x20 [drm_kms_helper]
[ 5680.092153]  process_one_work+0x1de/0x410
[ 5680.092155]  worker_thread+0x32/0x410
[ 5680.092158]  kthread+0x121/0x140
[ 5680.092160]  ? process_one_work+0x410/0x410
[ 5680.092163]  ? kthread_create_worker_on_cpu+0x70/0x70
[ 5680.092165]  ? do_syscall_64+0x73/0x130
[ 5680.092168]  ? SyS_exit+0x17/0x20
[ 5680.092170]  ret_from_fork+0x22/0x40

Mine was also starting showing CPU#3 soft lockup again now, after 2 weeks of working like a charm from @Deflaktor solution.

The grub changes very much after the update (after the fix) and I am not sure how to tackle this.

I forgot where I found this, but appending “idle=nomwait pcie_aspm=off” seems to keep the laptop running even better than my previous solution. Can everyone please try and confirm my findings? I have not run into any crashes yet on kernel 4.18
AFAIK this seems to be a BIOS issue.

I completely missed your post @Shining_Ace. Since I still have issues with my lenovo I’m gonna try that out.

I am close to selling the damn thing though.
Let me know if you found anything in the meantime.

Use this kernel cmd parameter my laptop 720s didnt got lockup for a long time.

initrd=\amd-ucode.img initrd=\initramfs-linux.img rd.luks.name=43b24c52-4b03-4e16-a107-88883b3668fe=cryptroot rd.luks.options=discard root=/dev/mapper/cryptroot rw idle=nomwait reboot=efi mce=off pcie_aspm=off ivrs_ioapic[4]=00:14.0 ivrs_ioapic[5]=00:00.2

uname -a
Linux laptop-lenovo 4.19.12-arch1-1-ARCH #1 SMP PREEMPT Fri Dec 21 13:56:54 UTC 2018 x86_64 GNU/Linux

C6 state also disabled by tool zenstate https://github.com/r4m0n/ZenStates-Linux

./zenstates.py -l
P0 - Enabled - FID = 58 - DID = 8 - VID = 35 - Ratio = 22.00 - vCore = 1.21875
P1 - Enabled - FID = 66 - DID = C - VID = 60 - Ratio = 17.00 - vCore = 0.95000
P2 - Enabled - FID = 60 - DID = C - VID = 66 - Ratio = 16.00 - vCore = 0.91250
P3 - Disabled
P4 - Disabled
P5 - Disabled
P6 - Disabled
P7 - Disabled
C6 State - Package - Disabled
C6 State - Core - Disabled

with bios version

dmidecode| grep -i version
	Version: 6KCN38WW
	Version: Lenovo IdeaPad 720s-13ARR
	Version: SDK0K17763 
	Version: Lenovo IdeaPad 720s-13ARR
	Version: AMD Ryzen 7 2700U with Radeon Vega Mobile Gfx  
	String: Compiler Version: VC 9.0
1 Like

So, after an adventure with the lenovo support via twitter (don’t ask, it wasn’t pretty…) I have now the last BIOS version on my 720s-13ARR that is available for it. I installed kubuntu 18.10, installed kernel 4.20 and set processor.max_cstate=1 as well as pcie_aspm=off and so far it seems to be stable.

Yay! And it only took … a year.

So i have a Acer Nitro 5 with a Ryzen 5 2500u and in order to get into linux. i have to add noapic to the grub boot flags. id try adding that to your grub config

I just updated my BIOS to F.20 on my Envy x360 15 inch with Ryzen 5 2500U
I booted with no extra parameters and did not disable C6, it seems to be running just fine, but only on Linux 4.19
Whenever I try booting Linux 4.20 or 5.0rc, it just freezes on boot. Did anyone try 4.20 or 5.0rc?

1 Like

I’m using a Thinkpad E485 (Ryzen 5, 2500U) with linux-amd-raven kernel (based on Linux 4.20)

The kernel parameters I’m using are noapic and iommu=soft. I believe the 2nd parameter is needed for Linux 4.20. Got this info from the arch wiki (https://wiki.archlinux.org/index.php/Laptop/Lenovo#E_series). This config seems to be working for me so far (2days).

Haven’t tried 5.0rc, maybe you could try these parameters and test it?

I had the same exact problem actually. Kernel 4.20 and 5.0rc won’t boot at all for reasons I can’t comprehend because Kernel 4.19 could…

Thanks but that didn’t work :frowning:

I think I found the fix, it seems to be an issue with the firmware, not the kernel. Try downgrading your firmware blobs to a previous version. On Arch, it seems the October releases work just fine with 5.0rc5, but I can’t boot 4.20 regardless.