Lenovo E585 - Ryzen 2500u - Vega - Linux Hard Lock

Firstly, apologies for starting a new topic on this subject. I can’t make any more posts on the other E585 thread and I really need this computer to be stable. I do not wish to be a nuisance.

I’m still experiencing the hard locks I described in the other thread. Sometimes nothing is reported in the log file but usually I have errors like this:

Nov 26 14:59:49 kernel: gmc_v9_0_process_interrupt: 15 callbacks suppressed
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 930 thread Xorg:cs0 pid 936
)
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: at address 0x0000000104d0d000 from 27
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00101031
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 930 thread Xorg:cs0 pid 936
)
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: at address 0x0000000104d8f000 from 27
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 930 thread Xorg:cs0 pid 936
)
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: at address 0x0000000104cd3000 from 27
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 930 thread Xorg:cs0 pid 936
)
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: at address 0x0000000104c42000 from 27
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 930 thread Xorg:cs0 pid 936
)
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: at address 0x0000000104d0f000 from 27
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 930 thread Xorg:cs0 pid 936
)
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: at address 0x0000000104d91000 from 27
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 930 thread Xorg:cs0 pid 936
)
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: at address 0x0000000104cfe000 from 27
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 930 thread Xorg:cs0 pid 936
)
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: at address 0x0000000104cd5000 from 27
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 930 thread Xorg:cs0 pid 936
)
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: at address 0x0000000104c44000 from 27
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:1 pasid:32768, for process Xorg pid 930 thread Xorg:cs0 pid 936
)
Nov 26 14:59:49 mercury kernel: amdgpu 0000:05:00.0: at address 0x0000000104d14000 from 27
Nov 26 14:59:49 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Nov 26 14:59:59 kernel: [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx timeout, signaled seq=474663, emitted seq=474665
Nov 26 14:59:59 kernel: [drm] GPU recovery disabled.

I am running the latest KDE Neon, kernel 4.19.5. I have tried the padoka stable, unstable, and oibaf repositories for the MESA drivers but none of them fix the problem. Also have tried earlier kernels. Are other people experiencing this issue with Linux on the E585?

I’ve found similar reports on the Internet with regards to hard locks with Vega, but have not found any definitive causes or solutions.

I’m guessing that the hardware just isn’t fully supported? Will it be in the future? Any hints on what could be the problem or other things I could try would be greatly appreciated. Many thanks.

1 Like

Hate to do this but I would (and probably actually will) just sell that machine and get something that works. I am on the Ideapad 720S, also 2500u. I can’t get the thing stable, it’s a nightmare.

Current plan for me is sell the lenovo and get something made for linux, preferably by linux people.

1 Like

a few things to try. first is to use the kernel flag iommu=pt or iommu=off
second is to check if there are any available bios updates. I remember reading about similar issues with ryzen mobile but I also remember solutions being found so I’ll update when I find what I’m talking about.

edit here’s the thread

2 Likes

Thanks guys. I will take a look at it tomorrow. Bed time now :slight_smile:

EDIT: Have been reading through this thread:

https://forums.lenovo.com/t5/ThinkPad-11e-Windows-13-E-and/ThinkPad-E485-E585-Firmware-bug-ACPI-IVRS-table/m-p/4191484

Which discusses the IOAPIC / IOMMU problem and also refers to hard locks towards the end.

The thread was linked to from:

https://evilazrael.de/comment/914

Which in turn was linked to from the other E585 thread.

I am also getting errors in the log file about IOMMU:

kfd kfd: Failed to resume IOMMU for device 1002:15dd

Don’t know what device 15dd is (can’t see it listed with lspci). But this thread https://bugs.freedesktop.org/show_bug.cgi?id=107898 and urmamasllama 's comment point to trying iommu=pt .

The original poster in this thread: https://forum.manjaro.org/t/amd-vi-unable-to-write-to-iommu-perf-counter/56777

hypothesizes that 15dd refers to the VGA controller. Which if correct, could explain the Vega hard locks.

Will try iommu=pt and report back.

EDIT: Nope, did not work. Just had a hard lock. Nothing in the log file. Will update bios next.

EDIT 2 (3/12/18): Hard lock with latest bios. Nothing in the log file.

2 Likes

You might want to try building the next 4.20rc kernel - there may be some 2500u specific fixes if they get merged. Also, I think AMD posted new microcode today that should get picked up soon.

1 Like

add idle=nomwait to kernel cmdline