[SOLVED] GPU? issue / Sometimes fails to boot

EDIT: Solved; bad PSU

Hi — the short story is, every now and then my system hangs and will fail to reboot with the ol' long-short-short beep code (GPU error). I "fix" this by reseating the GPU, but that doesn't work 100% of the time.

Relevant specs:

ga-z68x-ud3r-b3, f7 bios (latest non-uefi/beta bios)
i5 2500k (stock)
f3-12800cl8d 8gb ram kit
sapphire 7870 oc edition

This has happened maybe 6-7 or so times over the 4 years I've had this build. Just now when it happened,after alt-tabbing to a 3d application, I completely removed my GPU, reseated it into the same PCIE port, and tried again. It didn't work. At this point I had my case lying on the side in the suspicion that gravity was eventually causing some connectivity problem*. So I turn off the power, open the side door and just gently press down on the card and wiggle is slightly. I try again and it works.

Unfortunately I was also greeted with "windows could not start correctly" and after an automatic repair (restore point) I'm back with no loss of data it seems. This has never happened before though and I think it's unrelated.

  • I wonder how likely it is since it always seems to "crash" while in use and not occur overnight. Previously I'd considered bad drivers (currently 13.12 cat, which are most stable for programs I use) as the issue, but I'm not sure. I also had suggested that power saving features could cause this, so I've tried turning off such things in my BIOS but I've no definitive results. I still think the GPU is the likely cause.

Why I'm posting now:

Today I've had a series of problems that are almost certainly unrelated to this one, but the weirdness of this issue has me curious. So I'd like to hear any possible causes. For the full story, today I decided to purchase more ram, however my CPU fan blocks the unused ram ports. So I swapped the fan to the other side (pulling rather than pushing, which probably isn't as good — so I swapped the coolermaster fan out for the noctua case-fan I had just on the off chance that would help, not that temps are bad anyway). So that went fine, except that gigabyte only give me manual control over the CPU fan, not sys fans. So on a non-regulated port, the CM fan is pretty loud. I tried downloading a utility (energy saver 2) from my mobo's utilities app to see if that would allow to control the sys fan speed through software, but no luck. As it happens, when messing around with that software, pressing a button caused a full system lockup. I rebooted my machine, which failed to POST (no error) and shut itself off, turned itself back on, and repeated that until I turned off the power. I waited a minute or so, turned it back on and it worked. I removed the utility. I mention this nosense because that makes me more suspicious that some sort of powersaving mode or feature is causing such a problem. Next time it happens I might have to try changing NOTHING except for cycling the power to see if it works.

So... thoughts? What could cause such behaviour? Is there anything I can test; am I missing something obvious? And just because I found this funny, the ram I bought (not yet arrived) is the exact same kit I purchased four years ago — in those four years it has become $2 cheaper from the same retailer.

Hi - I wanted to bump this because the issue is still occurring, on a new install and with a new GPU (MSI R9 380). It seems very unlikely this new card (a few months old now) has the exact same issue as the last, so... is my motherboard possibly at fault?

Also to amend some info in the OP: I don't have to reseat the GPU, it just takes about ~10 resets (full power off / unplug) and some waiting, before it will correctly boot with a working video card. It does seem to run without the GPU as I can see my usb devices initialise / hdd spin up), but obviously without video it's useless. By why would another component (mobo/psu/cpu?) sometimes cause the GPU to fail? And why does this begin with a lockup in windows?

How can I troubleshoot this issue?

Even before I got to your second post, I'm thinking it sounds like a motherboard issue where you are getting some voltage problems, maybe some bad caps. The second post seems to confirm it, however you will need a way to test the PSU as well in order to confirm that it's supplying the correct voltage.

My suggestion is to swap the PSU and motherboard and see if the issue occurs. Since you may not have spares, you can buy a PSU volt-ohm-meter to see if it's providing the correct output, and if so, it's probably your motherboard. Your description makes me think something funky is going on with the PCIE slot, but once again you have to isolate each component to determine the problem.

Hey, thanks for the reply. While I can get a volt-ohm-meter I'm not electronically savvy enough to be comfortable testing something potentially dangerous (it seems simple enough, but I don't care to fuck around with electricity); the PSU (Antec Earthwatts 650) is about 6 or 7 years old now, so it's probably time to replace it anyway. What I might do is get a new PSU (corsair rm650x seems good), and if the problem still occurs, look into getting a new motherboard.

http://www.newegg.com/Product/Product.aspx?Item=N82E16899261023&nm_mc=KNC-GoogleAdwords-PC&cm_mmc=KNC-GoogleAdwords-PC-_-pla-_-Tools+-+Network+%2F+PC+Service+%2F+Acc.-_-N82E16899261023&gclid=CNGakuqFl84CFURZhgodmHQAIA&gclsrc=aw.ds

Get one of these, it's super easy, no need to be intimidated.

those testers are good for two things.
1)it gives basic voltage reading,good to check if psu isnt total crap without any load ( voltages may be ok if there is no load or load is minimal) so not much of a help. It may not be stable under load. to test a psu you need a testing unit that draws actual power.
2)when you are making custom cables to check if you connected pins correctly .

Hm, that is interesting, I wasn't aware those existed. However, if they can't test the device under load, that seems a bit useless in my particular situation, as the issue seems to only occur under heavy load (but extended periods of time aren't required, so it's not a temperature thing, they're all quite low even under load). It's just the sudden increased frequency of lockups in the last week that have pushed me to look for replacement parts.

Anyway, I've ordered the new PSU, and if problems persist, I'll look into finding a 1155 mobo somewhere. At least I won't have to worry about running a ~7 year old psu on new components.

So something interesting I just noticed is that when running GPU-Z (from a possibly old .exe I had downloaded) my system hanged in the same way it does randomly when under stress - though it didn't require any powercycling to reboot. Not that the regular gaming lockups always require it either, just perhaps 90% of the time. So I guess it must be the motherboard, or a very very unlucky new GPU purchase. I'm under my linux OS now and wondering if I can replicate this failure somehow. Though, it has to be a hardware issue; why else would it sometimes not boot with working video output (and give the beep code for this).