Random deadlock freezes. SSD?

Good time of the day, Community.

Wanted to ask for some advice on such a topic - for several years I have been facing an issue with laptops going to full deadlock with not even the cursor moving.

I would cross out a specific distro (in terms of laptops… currently it occurred only with Ubuntu… and recently Fedora).

Same goes for a specific laptop model. I think this would be my 3rd laptop have such issues.

One thing I would outline though (and more detail on the question of WHY I made a pointer on that specific thing) is that the system has only one memory drive, and that is an SSD.

My linux knowledge isn’t yet there to know how to read journals for errors. But the last times I’ve looked (with Google’s help) I didn’t find any record that would lead me to a cause.

On wednesday, 3 days ago, I decided to move to Fedora 37 Workstation. I was moving from Ubuntu 22 LTS (an update from 20 LTS). That one, in origin, also had this problem, but it seemed to have “went away” at some time point after I added a file based swapfile (but I remember it happening after that 1 time… but not at the previous ratio of “at least once a week”).

I can’t really attach it to a specific action. Today it happened while I had a few apps opened (Intellij, vpn, postman, sublime, dbeaver, firefox… and a docker service running). I opened a new app, and at that moment a Firefox notification, originated by Google Calendar appeared.

I have a gnome top bar plugin, which shows CPU, Memory and Network load. The CPU was at 0.6, memory at 74% and Network at idle.

Before that, on wednesday I think I had a handful of similar freezes. If to think of it, they may be related to “application opening”. But on Ubuntu I do remember it sometime dying on me when I was trying to unlock my desktop from login screen.

I can tell that all systems (even Fedora) is installed on ext4 with separate /boot, /boot/efi partitions.

My current laptop is Asus Vivobook X512J

Now for the SSD part.
In my history, I think every SSD, which I owned (Kingston, WD), had a filesystem problem.

This started off with my first “good hardware” build done back in the days of i7-7700k. I bought my first SSD then. I didn’t cheap out and bought Kingston, thinking that it had decent quality.

That build had one problem - I mislooked that my mobo arrived with bent pins (and since I didn’t see it at once… I couldn’t return the mobo… or buy a new one. First job and yata-yata).

But the problem was 100% the same. I turned on the PC, when I came from work. Launched browser, music and a korean mmorpg I was playing at that time (which, as turned out later, had a rock fall of temp files being constantly written to the drive). The drive hosted both the OS and games.

At some point I bought a new mobo. But the problem remained. At some point I thought that the pins damaged the CPU. Later on I bought a new CPU… but the problem remained.

… several #$@#!# later …

After replacing absolutely everything beside the ssd (ssd’s don’t fail, don’t be silly!!!) and the case, I thought I was cursed and literally had zero ideas until a friend of mine said “hm, have you tried the ssd?”.

And, of course, it was the SSD. I even managed to reproduce the issue with removing 1 random stick of ram, loading it up to 100% (that game + chrome did well with that problem), after which opening a few more tabs just so that would go to pagefile. Few clicks and “BINGO”.

I consulted with Kingston and they said it was a filesystem problem. A zero format of the whole thing did the trick. After that I believe I did it once after a few years and by this day my OS is on that drive.

But then I bought a 1TB WD SSD (with 2 M.2 drives in the system… and they work like a charm), which was 2018sh. It didn’t even take me a day to say “AHA!” after installing that drive and getting a 100% familiar freeze. And a zero format solved the issue once again

(I believe I even advices this solution to people with similar problems and every time got a “Solved it!”).

So today I once more, after once more witnessing this problem with my laptop, I recalled “that thing with SSDs”.

P.S. I will consider doing a zero format for the whole drive (maybe even going btrfs), but this is my daily working system. Plus it’s fully configured and polished and it pains me to do a reinstall…

1 Like

never experienced any issue similar to yours on any laptop ive ever owned. but i do have some amateur experience diagnosing various system crashes. the term “deadlock” isnt very informative by itself. there are things we can do to rule out certain, specific, types of freezes.

first thing i would try is to rule out a simple gpuhang. can you setup an ssh server on the problematic system, and then attempt to connect to it from any client while its frozen? will an ssh connection persist through these freezes? if so, then i’d be most curious to see your dmesg. if not, then its definitely more than a simple gpuhang.

when the system is deadlocked, do the numlock and/or capslock light toggles work?

its never a bad idea to check your SMART report for hardware problems.

i 100% would NOT recommend BTRFS to a linux newcomer. the features and benefits of BTRFS are not worth the performance cost (compared to ext4) unless you have specific intentions to use those specific features.

1 Like

I know, but it’s the best term I could think of. It really feels like the system was trying to get a grip on some resource and failed.

Hm. This is a possibility. Although I did try to switch to console mode via ctrl+alt+f1/2/3… .

Also there are two and a half problems:

  1. It likes to happen “in that right moment”, where I usually need the pc “right now” or similar.
    1 and 1/2. It happens really rare.
  2. Since it’s a working laptop, I have root user disable and vpn is in “block all incoming connections” for security reasons.

I will configure this, but it will take time.

No. An absolute ignore of anything. Numlock and caps lock presses have no effect (wired keyboard, connected directly to the laptop).

Need to google how to do this. But even in the days of the 7700k, I didn’t get any error count increase. But I think I got a wealthy number of “disconnects” (or how that was called).

I also don’t want to bring in new “unknown territory” when I still haven’t figured out with the existing one. Thought that maybe ext4 had some problems.

1 Like

well, both the gpu and USB cards are either failing, or shutting down in response to the real issue. a simple gpuhang is ruled out.

dont suppose that laptop has a serial connection you could use to access the console?

i think you’ve got a kernel panic going on. an easy way to tell would be to set your kernel to reboot on panic. how exactly to set this varies wildly by distribution.
if you then reboot after every freeze, then its definitely a panic, and you’d have to find a way to get that dmesg log.

Nah, it’s a 2016+ laptop.

THANKS! I think this is a good starting point. Will google how to do in Fedora.

1 Like

Remove the WiFi module. I know lots of WiFi modules that cause momentary deadlocks that all resolve themselves when the card is physically removed and you just use wired LAN or you use a different better card.

2 Likes

You mean “in BIOS” or an external one (if I have any)?

In the laptop itself. It needs to be physically removed.

This is a common way to improve DPC latency on laptops.

2 Likes

Sadly, not an option. I have a “customer’s laptop”.

BIOS killswitch will have to do then. See if it improves DPC latency in Windows.

2 Likes

Will give it a try once my BT dongle arrives. Thanks

Bluetooth or UK LTE? It can mean two things. Just wanted to be sure.

Bluetooth. Ordered a Realtek RTL8761B - LinuxReviews based usb stick.

1 Like

And I am back.
Have been working without a problem for… from my last post, 2 weeks. Then I rebooted (installed latest updates and set swapiness from 60 to 100).

I’m starting to get the impression that this is somehow related to applications, installed in “non-native” way - snap, flatpak and so on (or for some specific ones). After the reboot I started Intellij (non-flatpak), Firefox (flatpak), Slack and pritunl (vpn).

The moment I started PostMan(flatpak), which happened after a minute after the previous bunch, I got a freeze. I did have kernel.panic set to 30 (I would believe Seconds). Verified that it didn’t budge.

Two strange things to note - PostMan was the first thing to go into swap (I have a extension for gnome). The “frozen screen” was showing 256kb utilized of swap (before that I was monitoring that value and it was 0). Before (and after) the reboot, Swap was used heavily and that didn’t cause any issues.

Second was the fact that after a minute or two of the freeze, I started hearing my laptops cooling kicking in. I can guess that it was either the OS losing control and the fan returning to BIOS controlled, OR a loop, in which there was still “computation” going on.

In the end, I still had to resort to 7 second power button.

So the hunt continues…

For the record, I gave up on bluetooth, so all of the peripherals were in wired mode.

Do the freezes occur during periods of inactivity or when actively using the laptop?

And dumb suggestion. Have you tried turning off the power save features?

1 Like

I would say “right after start up” (reboot) in 99 cases of 100.

2 Likes