Friend's machine is goofed

Specs:

i7-4790K
Gigabyte Z97N Wifi
16GB DDR3
EVGA GTX 980 sc
Samsung 840
Corsair 650 watt
Corsair 250D

This is a machine I built for my buddy. First built it around november last year. Kicked ass at first and was rock solid after my stress testing. On his 120Hz monitor it was the best 1080 experience I've seen in a while...

However, a few issues came up about a month ago... After upgrading to the 353 driver from nvidia he started getting crashes now and then. So I went over and rolled back to the one that was there from before. No biggie. Two days later he calls me up saying that it's been BSODing again. Shitty... So I went over and ran furmark, after a few minutes it crashes! Booted up on the 4600 and it ran fine... So right away we back up all the data he has onto a 2TB wd green and reinstall to 8.1 from 7. I had to take off then but he's a "smart guy" and I trusted him to finish off the post install.

Followed up next day and everything was peachy. But then after less than a week it was doing the same thing; locking right up and rebooting every time anything gpu accelerated comes on! I threw the gpu in a static bag and took it back with me. In my old rig I ran furmark on Linux for about four hours and it was all good. I grabbed my windows drive and loaded up the 353 drivers, worked fine. Tested almost every version of the driver I could find, at least 30 minutes of furmark on each one!! Every last on of them passed.

Now here's where I get pissed off. While I had the graphics card he was running off the Intel. Two days ago he calls me and says that it did it again. At this point it has to be the mobo, so I coach him through flashing it to a new version. After the update, it does it again. What the F(*&K??? There seems to be little to no reports of boards doing this on user forums, so it must be some stupid easy fix, right???

So what we've learned is that the gpu is fine, and the mobo is goofed... Correct? Are there any tests I should run before starting the dreaded RMA process? Also, is there any way I could get another board instead and just ditch gigabyte?

Hoping things go well, because this thing is a beast and I really want it to be working for my buddy again.

Thanks, dudes!
- Noah

Check the ram, run MEMTEST. Try it with one stick, if it fails try it with the other stick. If both fail get known good ram and test the system again. Still failing you have now narrowed it down to Mobo, CPU or HDDs. Run S.M.A.R.T checks on all HDDs. Still fail? Unplug all drives that are not the boot drive if applicable and test again. Still fail? Known good HDD and test again. Still fail? Now you are down to CPU or Mobo. Known good CPU, test, still fail Mobo.

And it can ALWAYS be the PSU.

EDIT: what specific PSU is it. What series?

maybe you could try a diffrent driver, or roll back to the previous one?
Next to that you could try a bios reset.

What you could also try is reseating the cpu, somethimes wenn the cpu is not full seated correctly,
the system can suffer from strange hangups.

Make sure that you have minidumps enabled, and use a program to analyze the minidump files.

Also, check the Windows Event Log.

These should help you discern the issue.

Also, for funsies, check temperatures.

I've personally been going through hell with the 353 driver, like, BAD have you tried completely nuking the drivers? they like to hang around and cause mischief if you don't.

@Zibob I did test the RAM and SMART errors. I used memtest on both DIMMS at the same time, both came up clean after three or so passes... SSD came up clean too. Also, the model of the PSU is RM650. I know the wattage is not the problem because when I was stress testing it on my rig it has a 550 watt.

@MisteryAngel I could try reseating the CPU. The thing is, it worked for so long without problem. That seems like the kind of problem that would be evident right away... If everything else fails that could be a good one.

@Calculatron Did check the event log and as always it didn't give me much. I'll have to check the minidumps. Good idea. Temps are fine, it's an open air cooler and case and it's up on a desk with plenty of air around it. CPU tops out at 40 or so, and the GPU at around 65 give or take.

@SoulFallen Did do a clean install as prompted in the installer, but we also did a complete wipe and reload of the OS... As soon as we installed ANY driver it crapped out.

Thanks for the advice guys, really appreciate it!

Remove the cpu OC and/or redo it.