New, unexpected slowness and rebooting after very short time post-login and only idling on desktop

Issue:

PC had recently been suffering from SSD randomly flipping into read-only mode. Since replacement a couple days ago, the machine had been fine for 2-3 days afterwards through the same workloads that it was previously having issue with.

After shutting down on Oct 25th, first boot in the morning of Oct 26th took an inordinate amount of time to reach the login page (uptick from seconds to several minutes). No kernel or package updates were applied as part of that shutdown cycle. During this time, the USB devices would flick on and off as if they had lost power or been reset several times. Once onto the desktop, keyboard inputs would either be ignored or frozen into spamming the same key, mouse inputs were ignored or moved at (estimating) 1/10th-1/100th the sensitivity they used to. Machine’s responsiveness was best described as ‘pentium 1 and OS on a HDD’ level of responsiveness. Machine would not make it longer than a minute before rebooting on its own, just sitting on the desktop.

Hardware and software involved:

Initial build (August of 2023-ish):

OS: Nobara 40, all updates applied.
BIOS ver: 1.21 (initial X3D compatibility bios, has been there since initial build. There are newer updates available, not applied yet)

New SSD is a WD SN850X, attached to the direct-to-CPU Gen5-capable m.2 socket. The Solidigm is connected to the Gen4x4 chipset socket.

Troubleshooting so far

  1. Reset the bios back to default settings (this reset the fan curves, and turned EXPO off on the RAM) – no change, same behavior.
  2. Attempt booting from the Solidigm drive (also running Nobara 40) – no change, same behavior.
  3. Looking through hardware monitor in the BIOS and checking voltages. Things seemed fine, but BIOS load vs. desktop vs. gaming loads are different of course. But since motherboard is a bit suspect, unsure how trustworthy it is at reporting that.

Questions:

  • Given that it’s affecting both OS installs on the machine, correct me if this is a mistaken assumption but it doesn’t seem like a software issue? At least as far as, say, ‘grab a live USB image and try another distro/OS’ goes. Both are running Nobara 40, but one (the Solidigm drive’s install) is a week or more out of date on updates.
  • I do not have a spare AM5 CPU, AM5 motherboard, or PSU of appropriate rating to swap parts and check. But these seem like the most likely culprits, to me. What additional tests could I run to try and rule in or out which one is most likely? 'Cause otherwise those 3 parts in combination are basically ‘hey, wanna just build a new PC?’ :yay:

Thanks much for any insight.

This sounds like some kind of hardware problem rather than software, something on the PCIe bus would be high up on my list of culprits.

You could try repocketing different pieces of hardware to see if a bad connection is the cause, or even removing non-essential PCIe hardware.

1 Like

I’m curious what would happen if you threw in a live USB with another distribution on it and booted. You may have a motherboard issue or something, and if this works flawlessly…

Have you tried it without the GPU?

1 Like

Finally got a chance to do more than move it over to the workbench tonight. Maddeningly, after leaving it unplugged for a day… it just fired up like any other normal day, and just worked. :facepalm:

So what might have changed besides just no power for 24hrs (which, like… I’d reset the bios? If it had forgot things and gone back to defaults anyway, it’dve been on the same defaults?):

  • It was still in the same room, on the same circuit breaker with all of the rest of the usual load (+ backup laptop running, so technically this would be even ‘worse’ if that were the issue)
  • It was not on the same outlet. Previously was on a power strip that also runs the triple monitor setup, speakers, etc. as you’d expect. In the new spot it was plugged directly into the wall.
  • Significantly less USB devices. Main desk has a couple of monitors acting as USB hubs for microphone, phone charging, and so on. Troubleshooting bench is just a monitor, keyboard, and mouse.
  • Keyboard and mouse were also plugged into different ports vs. at the desk. At the desk it was the USB2 ports, at the bench it was the ‘lightning gaming’ ports (which is just one port each from different USB controllers on the board, intended for keyboard and mouse).
  • Ethernet wasn’t plugged in.

Since then, I’ve tried other USB ports, adding the ethernet back, and so on. It hasn’t misbehaved again, at least on the bench. Ditto for back on the desk with the other peripherals hooked up exactly as they were previously.

I hate it when it’s a consistent, 100% repro and then as soon as you ask for help or have support on the phone it starts working again for reasons inexplicable :facepalm: Will leave the thread open for a bit in case it comes back while I’m shaking it down some more. Thank you much for the suggestions, though! :slightly_smiling_face: