What is the best place to start investigating seemingly random windows crashes?

I have a windows machine that was stable up until a few weeks ago. No hardware changes at all. The machine seems to lock up (no BSOD) when I’m using it. Windows sleep mode if off. What is the best way to start investing the cause?

Start going through event viewer and see what events occurred around the time of the crash. Then google those events.

2 Likes

Might want to run a memory test to make sure it’s stable.

4 Likes

Check the Windows event log for Disk errors, timeouts, etc.

1 Like

Windows corrupts itself over time. DLL version incompatibilities, files get deleted and corrupt, etc.

I run a script monthly that fixes all sorts of little issues on my work PC (Win10 Enterprise. Try this as a BAT File, run as admin.


DISM /Online /Cleanup-Image /CheckHealth
DISM /Online /Cleanup-Image /ScanHealth
DISM /Online /Cleanup-Image /RestoreHealth
SFC /SCANNOW


That makes sure your base OS is working correctly. If not, it fixes any issues. If SFC says corrupt files were found, shut down after. That is power down, unplug from power for 30-60 secs then power back up. That clears all errors in RAM.

If you still have issues, Event log is the place to start.

1 Like

I’d start with unplugging/replugging memory and expansion cards. Does(n’t hurt to do the same with power cables. If you have an older power supply check the power levels on it. Then do the rest :stuck_out_tongue:

1 Like

Start with hardware testing.

Just because hardware was stable doesn’t mean it hasn’t died.

Whilst crappy drivers and other stuff can make windows crash in the past 25 years I’ve been using NT kernel based versions it’s mostly been either hardware problems or a bad driver.

This is not true at all

I administer well over 1000 Windows Servers, and never had one just break for no reason at all

1 Like

Perhaps in the old days like Win 95/98 era.

I have a machine somewhere that has Win7 installed in 2010, then upgraded to 10 in 2021. In the meantime, changing hdd to ssd. All updates installed. Legal key. Works without reinstalling still. But the system is well maintained and there is no mess… :wink:

These are not the times when Windows could get damaged when you look at it crookedly. :wink:

Agreed. Windows can get corrupted over time but it’s generally shitty software installs, power problems etc.

I have servers that have been in place upgraded (VMs) for 15 plus years without running any of that crap which are 100% stable. Because I limit what is installed on them.

I didn’t mention servers. No computer breaks for “no reason,” there’s always a cause. And it’s typically caused by a user (or Windows update).

Win 10 is better, but it still has issues with DLL incompatibilities and corruption over time. Take a look at any average home PC. It’s a pigs breakfast, especially if there’s kids / teens on there. I worked on a PC that was three days old and useless. Choked with malware and crap software. Two teenagers took it down quite effectively. People installing various software packages (especially game PC’s) with different versions of DLL’s will cause issues over time.

The servers last longer because admins are knowledgeable, more disciplined and don’t install / uninstall stuff all the time. They’re a lot more static than PC’s and usually only run one task or software package.

I’ve seen many weird issues on PC’s vanish with a DISM / SFC check. It’s my second action after getting a complaint. The first is a power down - unplug from power for 30 secs - plug back in - boot.

1 Like

random lockups can be caused if hpet is disabled in eufi/bios as its a gigabyte board it may not be enabled by default.
check its enabled.

also while your in bios, your tRC looks to be way out of spec…
it should be 65 matching the xmp profile your running.

you said you had an off where you had to reset bios?
your board has a dual bios. it may have switched to the alt when your system crashed.
if it did then you will need to re adjust your bios settings and may even need to update it.

if all is well in bios and your running your prefered settings.
then boot into windows and head to the event viewer.
look for critical’s around the time of the last crash.
the last crash will be date stamped with a kernal lost power error.
if it happened today then look in the today column and trace your events from there.

Do you happen to live in North Hemisphere?

If so, probably relax yourself a bit by doing something else, and sail it through the heatwave.

1 Like

This thread is evidently full of people who haven’t used Windows in a while. It may be fine if all you do is boot once a week to check your email or something, but try to do anything more complicated than that and you will be reinstalling, or worse troubleshooting, regularly.

I have a WindowsToGo on a USB drive I use for overclocking and benchmarks, and I have to reimage the thing every tenth boot. Even a trace of memory instability and Windows will overwrite itself with garbage without fail. I can’t imagine trying to do this with a daily runner Windows install, it would be incredibly frustrating.

Have a fleet of several hundred servers and 1900 workstations (users do not have admin and software controlled via patch management and managed installs) and windows doesn’t “just corrupt itself”.

Problems are down to (normally)

  • broken hardware or good hardware being run out of spec without sufficient stability testing
  • power issues causing filesystem damage or poor power delivery to critical system components
  • bad driver (normally display drivers)
  • bad software install
  • malware

I hate windows as much as the next guy, but windows itself is actually pretty solid since Windows NT.

The problem is being disciplined enough to

  • not run hardware out of spec
  • not buy cheap shitty hardware that comes with cheap shitty drivers / do not run bleeding edge non-certified versions of drivers
  • while we’re at it - don’t run bleeding edge brand new hardware with teething issues
  • put your machine on a UPS/power filter and a quality power supply
  • not install random shitware you don’t need from random third parties.
2 Likes

I’m overclocking my memory, so what I’m doing is running good hardware out of spec. The problem isn’t crashing, I expect it to crash, it’s the system breaking itself when it crashes. Windows wouldn’t corrupt itself if it wasn’t constantly writing to system files, which begs the question why it is constantly writing to system files.
If I start software A, load file 1, and software A crashes, file 1 should be untouched save if the crash occured during a write, and I can try again as many times as I need to. For some reason Windows constantly writes to its own binaries, inevitably corrupting the system when a crash occurs. If I get a crash on Linux, only the files I’m actively writing to will get corrupted. This is why we have journaled file systems (with redundant journals, of course, otherwise they too would be getting corrupted). I’m on a CoW file system, which is even more robust (I’ve never broken a file even by crashing while writing to it, I just get the old version of it instead). Windows really ought to be equally robust (or preferably more so, given how it’s supposed to be for consumers and consumers will do stupid things, like my overclocking misadventures), but it just isn’t.

And it’s not a matter of hating windows. I don’t. I’m just forced to use it, and find it really far inferior to macOS or Linux.

Many of us are in that boat. Home system is Linux, work systems are Windows. Windows could be soooooooooo much better if they took all the telemetry crap out and stopped focusing on changing international standards for their OS only. The constant changes, forced updates with unnecessary and non-removable “features.”

Or maybe system admins are yet to realize the full potential of using an Xbox in a Win 10 Enterprise environment.

image

1 Like

Yes, and DISM / SFC fixes most of the issues caused by the above.

Years 'n years ago I had a power supply that after a thunderstorm (or just age) couldn’t boot four hard drives any longer, only three. Took me some time to figure that one out.

Life hack, build your computers during the summer. Computers that are sufficiently cooled during the winter might not be during the warmer months. (Here we send a thought to a guy I knew who had to put a semi industrial fan blowing into his computer a hot summer living in Los Angeles.)

1 Like

One of the best advice for PC builders!

I learned it the hard way after not have built a PC for 20yrs. The harshness was also complicated by AMD Ryzen which disrupted the conventional way of boosting frequency, and excellence in binning their chips. The better vendors able to bin their silicon, the worse life is for overclockers.