Also having problems with the Crucial SSD connected via USB 3 using a UGreen enclosure. Sometimes works and sometimes doesn’t. Same symptoms as when connected internally, long POST, drive not recognized.
During my last boot of Linux (KDENeon) with the Crucial SSD via USB 3. I got a much of errors and the boot failed. Stuff like:
Error Transfer Event … for unknown stream ring slot 1 ep 6
Buffer I/O error … lost async_page_write
fsck exited with status code 4
root filesystem requires manual fsck
Connecting the drive to another computer running WIndows 10 causes the access light on the enclosure to flash continuously. The same thing happens to a FAT32 formatted USB stick (containg a live KDE Neon) that used to work fine but which now doesn’t boot on the spare computer (but does boot on the E585). It only started doing this after using it on the E585.
Going to reinstall KDE Neon (was using Kubuntu previously) on the Crucial SSD but I will use the spare computer to perform this and then do some diagnostics on the drive just to eliminate the possibility there is a problem with the drive. I don’t think there is, I think the E585 is screwing up the drive.
EDIT: Because I’m not allowed to make another post.
Hopefully some of my ramblings are of use to someone.
I reformatted the partition on the Crucial SSD drive, ran fsck on it and all is well with it. I then reinstalled the latest KDE Neon.
Have decided to try using the USB 2 port (the one on the right hand side of the laptop) instead of a USB 3 port. So far, fingers crossed, there are no problems with the drive being recognised or any corruption.
I’ve upgraded the kernel to 4.18.12 and MESA using the padoka PPA as suggested by beer . Unfortunately have been experiencing the following two problems:
- Random lock ups. Not sure how to diagnose this will do some more investigating.
- Screen is dim (wasn’t like this when I first installed, but happened later for no apparent reason). Checked screen brightness, it is 100% - but is dim in Linux. Probably something to do with the limited support for the Vega graphics. Any ideas? Do the mesa drivers need to be compiled / integrated into the kernel or something?
Many thanks.
EDIT: Fixed the screen brightness just by using F6 Weird how Linux said brightness was 100%. I guess that is independent from setting it on the laptop.
Will see how the hard locks go.
EDIT: I flashed the firmware on the Crucial m4 SSD to the latest version and that seems to have fixed the problems I was experiencing. It was a bit flakey over USB 2 as well (I think that what was causing the locks). But now it is recognised and solid (fingers crossed) even when connected internally via SATA. I know it was the obvious thing to try but I was a bit wary about trashing the SSD, but all went well. Plus I’ve been really stressed and irritable lately.
I had to add the extra commands to the kernel boot parameters as described in the blog beer linked to otherwise I was getting a black screen on boot. Initially it worked okay with just the first boot parameter but something must have changed and I had to add the others. In summary I am using the following additional boot parameters:
ivrs_ioapic[32]=00:14.0 ivrs_ioapic[33]=00:00.1 spec_store_bypass_disable=seccomp
Not sure which of the second two parameters did the trick, or if both are required.
I still stand by my comments regarding the screen (backlight bleed) and touchpad buttons (poor response), they are not up to standard.
EDIT: Still experiencing hard locks
Here is the log using journalctl:
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: at page 0x0000000101e00000 from 27
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: at page 0x0000000101e04000 from 27
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: at page 0x0000000101e02000 from 27
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: at page 0x0000000101e00000 from 27
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: at page 0x0000000101e02000 from 27
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: at page 0x0000000101e04000 from 27
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: at page 0x0000000101e01000 from 27
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00301031
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: at page 0x0000000101e02000 from 27
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: at page 0x0000000101e04000 from 27
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:3 pasid:32768
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: at page 0x0000000101e00000 from 27
Oct 09 14:48:42 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 09 14:48:52 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=59
Oct 09 14:48:52 kernel: [drm] GPU recovery disabled.
Looks like amdgpu driver / kernel issue.
Got another hard lock, but this time cause is different:
Oct 12 14:46:49 kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0
Oct 12 14:46:49 kernel: pcieport 0000:00:01.6: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Oct 12 14:46:49 kernel: pcieport 0000:00:01.6: device [1022:15d3] error status/mask=00001000/00006000
Oct 12 14:46:49 kernel: pcieport 0000:00:01.6: [12] Replay Timer Timeout
Oct 12 14:47:02 kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0
Oct 12 14:47:02 kernel: pcieport 0000:00:01.6: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Oct 12 14:47:02 kernel: pcieport 0000:00:01.6: device [1022:15d3] error status/mask=00001000/00006000
Oct 12 14:47:02 kernel: pcieport 0000:00:01.6: [12] Replay Timer Timeout
Oct 12 14:47:38 kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0
Oct 12 14:47:38 kernel: pcieport 0000:00:01.6: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Oct 12 14:47:38 kernel: pcieport 0000:00:01.6: device [1022:15d3] error status/mask=00001000/00006000
Oct 12 14:47:38 kernel: pcieport 0000:00:01.6: [12] Replay Timer Timeout
Oct 12 14:48:21 kernel: pcieport 0000:00:01.6: AER: Corrected error received: 0000:00:01.0
Oct 12 14:48:21 kernel: pcieport 0000:00:01.6: PCIe Bus Error: severity=Corrected, type=Data Link Layer, (Transmitter ID)
Oct 12 14:48:21 kernel: pcieport 0000:00:01.6: device [1022:15d3] error status/mask=00001000/00006000
Oct 12 14:48:21 kernel: pcieport 0000:00:01.6: [12] Replay Timer Timeout
Looks like I’m not the only person to experience the problem:
https://forum.antergos.com/topic/10372/hard-freeze-due-to-pcie-bus-error
and here:
https://forum.level1techs.com/t/ryzen-vega-laptop-pcie-bus-error/124661/60
Going to try the processor.max_cstate=1 boot option.
EDIT: No more hard locks. But getting scary disk errors on the Crucial m4 SSD:
Oct 15 13:33:17 kernel: ata1.00: exception Emask 0x10 SAct 0x8000000 SErr 0x280100 action 0x6 frozen
Oct 15 13:33:17 kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Oct 15 13:33:17 kernel: ata1: SError: { UnrecovData 10B8B BadCRC }
Oct 15 13:33:17 kernel: ata1.00: failed command: READ FPDMA QUEUED
Oct 15 13:33:17 kernel: ata1.00: cmd 60/00:d8:00:5f:b8/01:00:04:00:00/40 tag 27 ncq dma 131072 in
res 40/00:dc:00:5f:b8/00:00:04:00:00/40 Emask 0x10 (ATA bus error)
Oct 15 13:33:17 kernel: ata1.00: status: { DRDY }
Oct 15 13:33:17 kernel: ata1: hard resetting link
Oct 15 13:33:18 kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Oct 15 13:33:18 kernel: ata1.00: configured for UDMA/100
Oct 15 13:33:18 kernel: ata1: EH complete
Oct 15 13:33:18 kernel: ata1: limiting SATA link speed to 3.0 Gbps
Oct 15 13:33:18 kernel: ata1.00: exception Emask 0x10 SAct 0x3800000 SErr 0x280100 action 0x6 frozen
Oct 15 13:33:18 kernel: ata1.00: irq_stat 0x08000000, interface fatal error
Oct 15 13:33:18 kernel: ata1: SError: { UnrecovData 10B8B BadCRC }
Oct 15 13:33:18 kernel: ata1.00: failed command: READ FPDMA QUEUED
Oct 15 13:33:18 kernel: ata1.00: cmd 60/a8:b8:a0:15:67/00:00:00:00:00/40 tag 23 ncq dma 86016 in
res 40/00:c4:38:58:d1/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
Oct 15 13:33:18 kernel: ata1.00: status: { DRDY }
Oct 15 13:33:18 kernel: ata1.00: failed command: READ FPDMA QUEUED
Oct 15 13:33:18 kernel: ata1.00: cmd 60/78:c0:38:58:d1/00:00:03:00:00/40 tag 24 ncq dma 61440 in
res 40/00:c4:38:58:d1/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
Oct 15 13:33:18 kernel: ata1.00: status: { DRDY }
Oct 15 13:33:18 kernel: ata1.00: failed command: READ FPDMA QUEUED
Oct 15 13:33:18 kernel: ata1.00: cmd 60/00:c8:40:7b:76/01:00:00:00:00/40 tag 25 ncq dma 131072 in
res 40/00:c4:38:58:d1/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
Oct 15 13:33:18 kernel: ata1.00: status: { DRDY }
SMART report is looking scary also, especially Reallocated_Sector_Ct:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 0
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 12288 (0 2)
9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 25185
12 Power_Cycle_Count 0x0032 100 100 001 Old_age Always - 2641
170 Grown_Failing_Block_Ct 0x0033 100 100 010 Pre-fail Always - 35
171 Program_Fail_Count 0x0032 100 100 001 Old_age Always - 52
172 Erase_Fail_Count 0x0032 100 100 001 Old_age Always - 0
173 Wear_Leveling_Count 0x0033 098 098 010 Pre-fail Always - 71
174 Unexpect_Power_Loss_Ct 0x0032 100 100 001 Old_age Always - 8
181 Non4k_Aligned_Access 0x0022 100 100 001 Old_age Always - 141 7 133
183 SATA_Iface_Downshift 0x0032 100 100 001 Old_age Always - 1
184 End-to-End_Error 0x0033 100 100 050 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 001 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 001 Old_age Always - 0
189 Factory_Bad_Block_Ct 0x000e 100 100 001 Old_age Always - 81
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 0
195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age Always - 36
196 Reallocated_Event_Count 0x0032 100 100 001 Old_age Always - 35
197 Current_Pending_Sector 0x0032 100 100 001 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 001 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age Always - 8
202 Perc_Rated_Life_Used 0x0018 098 098 001 Old_age Offline - 2
206 Write_Error_Rate 0x000e 100 100 001 Old_age Always - 52
Errors could be due to a faulty cable or maybe a probem with comaptibility between the kernel and the SATA controller:
https://askubuntu.com/questions/133946/are-these-sata-errors-dangerous
I know the drive was in perfect condition until it was placed in this laptop.
EDIT: Spoke too soon, just got another hard lock (am using kernel 4.18.14):
Oct 15 17:59:10 kernel: amdgpu 0000:05:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:4 pasid:32771)
Oct 15 17:59:10 kernel: amdgpu 0000:05:00.0: at page 0x0000000105400000 from 27
Oct 15 17:59:10 kernel: amdgpu 0000:05:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00401031