Possible FX 8350 problem?

So, after assembling my latest build (after gathering parts for about 6 months) I played a couple of games on it and did some work. All seemed fine until it randomly hanged for no apparent reason and I had to hard reset it several times to get the machine to boot up. Before I continue, let me just list some specs so you can have an idea of what I'm working with here:

  • AMD FX 8350 @ 4GHz (stock settings, Family 15, Model 02, Stepping 0, Revision OR-C0 voltage 1.26v)
  • 16Gb Corsair Vengeance Low-Profile DDR3 1886 Quad-Channel Kit (4x4Gb Modules) running its XMP profile using the D.O.C.P. feature on my motherboard (link: http://www.corsair.com/en/vengeance-low-profile-16gb-dual-channel-ddr3-memory-kit-cml16gx3m4a1866c9)
  • Asus M5A99FX Pro R2.0 (Running the latest 2301 BIOS)
  • Noctua NH-D14 cooler (the slightly older one that comes with the 120mm and 140mm 3-pin fans, I'm running both the the low-noise adaptors because my motherboard does not want to voltage control the fans and I can't stand the noise)
  • Sapphire Radeon HD 7950 Vapor-X with BOOST (the boost button is turned off)
  • 256Gb OCZ Vertex 4 SSD with the latest 1.5.1 firmware installed
  • New 4Tb Seagate and one oldish 1Tb Seagate SATAII drive
  • 1000W Cooler Master V1000 Power Supply (I was planning on doing some overclocking and maybe running a Crossfire/SLI setup in the future, we'll see)
  • I'm running Windows 8.1 Update 1 with the latest Windows updates installed (as of today)
  • I'm also running the latest chipset and graphics drivers (14.4)

Some irrelevant, but perhaps relevant information:

  • I'm using a corsair M95 mouse
  • I'm using a Das Keyboard Professional Silent (with the PS2 connector for n-key rollover)
  • I've got a pair of monitor speakers attached to the onboard soundcard and I mostly listen to music using my cheap Syba USB DAC & AMP. The onboard drivers are up to date and with the DAC I just use Windows' plug-in-and-play driver.
  • I've got my power supply hooked up to a generic-brand 2000VA UPS, since I don't have faith in my house's electrical wiring and we do get a lot of power outages where I live.
  • I've got an Antec P280 case with 5 Noctua NF-F12 fans attached to it (2 blowing air out on top, one exhaust fan at the back and two intakes at the front). I've set my cooling profile in such a way that the fans run between 0 RPM and 1100 RPM (since I find this acceptable from an auditory point of view).
  • I'm also running a very unconventional dual-monitor setup: my Dell U2713HM is my main display with my old Acer G235H running as a secondary display (this is hooked up to my graphics card using a DVI-HDMI adaptor).

Now I will admit, I never checked to see if the RAM kit I bought last year is compatable with  my current board and I could not find it on ASUS' QVL list, so it probably isn't approved but I've never had problems with Corsair's RAM in the past. I've also had the strange problem of the computer not waking up from sleep (or at least the screen not waking up) since I installed my HD 7950 in my old machine (FX8120, Asus M5A97 Evo board).

I've surfed AMD's support forum for hours and this turns out to be a common problem either caused by a driver issue or the GPU simply not getting enough power to wake up). I've avoided this problem by simply not allowing the screen or the computer to go into sleep mode. If this happened in the past, I had to physically restart the computer (occasionally I had to do this repeatedly or switch it off, wait a while and turn it on again) in order to get it to show something on the screen and post. Just tweaking my power settings and not going into sleep mode seems to fix this.

Now, before I digress even more, let me get back to the original topic. My computer froze after I clicked on the start menu. At first I thought I'd wait a bit (since I had Firefox, Excel and SAS opened as well) and after about a minute nothing happened. I moved my mouse and the cursor was stationary and I pressed CAPS-lock on my keyboard (it's an old trick I use to see if I have a software or hardware problem) and sure enough it wasn't switching on or off, on the keyboard. There also was now BSOD and I have Windows set to show the screen and save the result when this happens.

After resetting the computer, I had a similar problem to the one I had in the past where nothing would show on the screen and you have to switch it off, let it rest and switch it on again before it showed anything. My case doesn't have a PC speaker and the board's graphics card warning LED was burning (which is kind of obvious since both screens were black). Eventually, I got something to show on the screen again and the computer booted into Windows. I haven't had this problem as of yet (about 2 days and counting).

From my experience with computers (and from some other forums I've visited) this is caused by either one of the following:

  • the motherboard is making a short;
  • I've got a bad processor;
  • the graphics card is bad;
  • the RAM is bad or
  • the power supply is bad

So, to diagnose the problem, I'd thought I'd have a look at each of these components. From what I've seen by looking at the BIOS and HWMonitor's logs the 12V, 5V and 3.3V are within 5% of the ATX specification and they don't seem to fluctuate radically or drop during certain events. In my mind that rules out the power supply. I don't have a voltage meter or anything fancy, so I can't do any more sophistcated testing than that. I also made sure that all the connectors are propperly connected to the components. No problems there.

My approach to seeing if the motherboard is making a short wasn't very scientific: as far as I could tell there weren't extra motherboard risers or bare wires crossing. I did accidentally spill a few drops of thermal compound on the board (Cooler Master IE Essential C1, which isn't conductive as far I know) in the past, but I immediately removed it with isopropyl alcohol and it was on one of the heatsinks. So, I wouldn't rule out a short completely, but I'm about 70% sure it's not a short.

The next step is testing the RAM, since that's reasonably easy to replace or fix. I read that you could spot bad RAM 90% of the time using 1-2 passes on Memtest86/Memtest86+ and the Windows Memory Tester. So I ran 2 passes using all the tests on these programs and the results were good. So, I thought I'd run Memtest86+ overnight, since most people seem to regard that more highly than Memtest86 and the Windows Memory Tester. It ran for about 10 hours and I got a error the next day on test #8 during pass number 6. According to the output, the problem is relating somewhere in my CPU. So I thought that I would run IntelBurnTest and AMD Overdrive's Stability Test tonight. The CPU passed the standard IntelBurnTest with 10 passes. My temperature was a maximum of 52 degrees celcius (which isn't bad, since South Africa is a pretty hot place even during "winter").

If you made it to here, thank you for bothering to read everything I wrote. What I want to know now is whether or not I've done enough to rule out the CPU/RAM? Should I perform more tests? Is there anything else I can do? I can't specifically tell you  what causes the PC to hang or when it will hang, since it happens randomly. Even the graphics card would fail to wake it randomly, about 4 out of every 9 times on average.

"here also was now BSOD"


Can you give us the BSOD error code, this might help diagnose what is wrong.


Another cause can be  with a hard drive going bad, and I hate to say this but the last time I saw this it was a OCZ SSD.

set the timings on the ram a little looser than spec and see if that gives you stability

This could be alot of things, but first i have to say that i run my FX8350 on stock speeds at 1.368V. i sett the cpu voltage control, from offset to manual mode in ai tweaker menu., and left everything on auto, before it did that, i had some voltage fluctuations with my Asus M5A97 EVO R2.0 mobo.maybe you could give that a trie, i also have disabled cool n quiet in the cpu menu, i just dont like those voltage and clockspeed fluctuations.

About you cpu fans, you can control those using Fan expert software, just install the Asus AI Suite.

But those random crashes could be alot of things, first i would try is taking everything out of your case lay your mobo on its box install, 1 ram stick cpu+cooler allready installed, connect your hdd, and install gpu. connect psu, and fire it up outside the case. just to check if there is no screw causing short cirquits.

If you stil have issues, then switch the ram sticks. GPU if you dont have a spare one, you could testthat in a diffrent pc.

Like i said could be alot of things, realy monitor the cpu and gpu temps during testng. maybe something is overheating..

Haha, how embarrassing, I could have sworn that I proof read that. Sorry that you had to read that and thank you for replying. What I actually meant to say is that there was no BSOD. It literally just froze.

I've actually looked into it being the SSD and I couldn't find any similar cases for my particular model. I've read about Vertex 2 and 3 drives doing this, but they used completely different controllers and firmware. Generally speaking the Vertex 4 has been pretty good, but I've thought about getting a Crucial M500 480Gb since they are so cheap, relatively speaking of course. 

I tried that. Right after the crash I ran a generic DDR3/1866 profile with much more conservative latencies and I ran memtest86+ right after configuring it. That's when I got the strange error. 

Well, it's going to be a pain undoing all those neatly zip tied cables, but I guess that is worth trying during the weekend. I looked at stock voltages and read that the stock voltage on this thing is 1.375V, so on auto this thing is quite under-volted on my machine. I will most definitely try what you suggested, thank you. 

Okay, so I tweaked my RAM settings a bit, as well as my CPU's voltage settings. I left Memtest86+ to run overnight again and this time it passed all tests with no errors. Ever since the one instance of the crash, the computer has never done that again. It's very difficult to diagnose such an intermittent issue. I'm at work now, so I left my computer to run Prime95 at home to see if it's stable enough. I'll let you all know how it went after about 6 hours.

Take a quick look into the System logs, just to see why it thinks it's hanging/crashing. (will be marked as Critical and followed (if it crashed) by a Kernal Power Error as well) Might be a driver issue...

I did look at that earlier. I can't recall exactly what the error was, but it was a Kernel error with ID 41. I'll post more details when I get home.

My son has an FX-8350 with the ASUS M5A99FX PRO 2.0 board & my pc has an FX-8350 with the ASUS ROG Crosshair V Formula-z board ... we both use Radeon based graphics cards in our homemade rigs.  Yesterday, or the day before (I forget which day it was because I work graveyard shift and lose track of time/day regularly) my sons pc AND mine both started freezing in the middle of gaming sessions just randomly... and couldn't be played again until we hit reset.  It was weird because it wasn't just one or the others rig... but BOTH of them were locking up with the in game scene just frozen like a photo.  I was confused at first as to how this could happen... until I remembered that I installed the latest graphics card driver from AMD or ASUS off their website download page (I forget my source), so I then restored both of the systems back to stock drivers from the disk that the cards came with from the manufacturer.  That solved the freezing problem... and we continued gaming without any further issues.  Check your graphics card driver version and determine its source, and then see if any updates were applied recently before your problem started... if you find that the new update was installed, get it out of there and go back to the stock driver... hope that helps.

Also, I encode Blu-rays losslessly okvvernight at least 2-4 times a month and it never crashes. I get 100%-ish cpu usage when I encode. So I'm thinking it's a gpu problem. I'm getting a second 760 and a new power supply this month so I'll try the new power supply before I put in the second 760 to see if it's the power supply. If it isn't it's the mobo or the graphics card. I'll try it with the new 760 by itself too to see whether or not it's a hardware problem or not.

Maxxhew, I've tried several versions of the graphics drivers (including the stock ones) and I've pretty much had the sleeping / hard reset problem with the graphics card since I've had it. The 14.4 chipset drivers are the only ones from AMD actually made for Windows 8.1, so you can't really go back too much. Using Windows 7/8 drivers seems like a hack to me.

Anyway, here's the information from the event viewer:

Log Name:      System
Source:        Microsoft-Windows-Kernel-Power
Date:          2014-04-27 05:55:51 PM
Event ID:      41
Task Category: (63)
Level:         Critical
Keywords:      (2)
User:          SYSTEM
Computer:      Evert-Desktop
Description:
The system has rebooted without cleanly shutting down first. This error could be caused if the system stopped responding, crashed, or lost power unexpectedly.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-Kernel-Power" Guid="{331C3B3A-2005-44C2-AC5E-77220C37D6B4}" />
    <EventID>41</EventID>
    <Version>3</Version>
    <Level>1</Level>
    <Task>63</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8000000000000002</Keywords>
    <TimeCreated SystemTime="2014-04-27T15:55:51.661984200Z" />
    <EventRecordID>12761</EventRecordID>
    <Correlation />
    <Execution ProcessID="4" ThreadID="8" />
    <Channel>System</Channel>
    <Computer>Evert-Desktop</Computer>
    <Security UserID="S-1-5-18" />
  </System>
  <EventData>
    <Data Name="BugcheckCode">0</Data>
    <Data Name="BugcheckParameter1">0x0</Data>
    <Data Name="BugcheckParameter2">0x0</Data>
    <Data Name="BugcheckParameter3">0x0</Data>
    <Data Name="BugcheckParameter4">0x0</Data>
    <Data Name="SleepInProgress">0</Data>
    <Data Name="PowerButtonTimestamp">0</Data>
    <Data Name="BootAppStatus">0</Data>
  </EventData>
</Event>

Kernel-Power, Event ID 41, Task Category 63. If you look around the internet a lot of people blame this on a surge problem with the power supply, which is impossible since the data I've looked at clearly indicates that my PSU is fine. The voltages don't vary much at all and are within specification. Also, isn't the UPS supposed to regulate the current and fix the voltages if I have surges?

i dont know if you have a spare HDD laying arround, if so you could download a windows 7 iso, and install that, install the latest drivers and then test again, if you also have issues on windows 7 then you know for sure its hardware related.

Also check your power management settings in windows, Somethimes the settings are set, that a hdd is going to sleep, after 30 minutes. then the system can freeze aswell.

http://www.w7forums.com/threads/official-windows-7-sp1-iso-image-downloads.12325/

Well,  before reading your post I formatted my ssd and reinstalled Windows 8.1 Update (I have a friend with an MSDN subscription). I first installed the chipset drivers from AMD (for some strange reason it only detects and installs some USB filter driver,  in the past I manually updated the SATA driver but I opted not to),  I then installed the latest drivers for my onboard nic from Realtek and I installed the latest HD audio drivers also from Realtek's website. I then installed all the latest updates for Windows,  downloaded and installed Firefox and I finally installed my Corsair M95's firmware and drivers. I edited my power settings and set everything not to sleep in the high performance profile. I ran Prime95 blend overnight for about 10 hours and all tests were passed. 

I'm pretty convinced that the CPU is okay. I ran the more extreme P95 tests on my old Windows install for about 7 hours and it got quite hot at some points, but it passed them too. I left hardware monitor open when I was running the tests last night and all the voltage readings were good. Nothing fluctuated outside of 5% of the specified voltages.  So it's definitely not the power supply. If necessary I will run P95 for 24 or even 48 hours. I'm starting to think that something wasn't playing nice with Windows or that some Windows file or driver got corrupted. I literally make it my mission to use as little stuff that require drivers as I can. I also set the RAM's profile back to the XMP one using D.O.C. P.  and I changed all the remaining timings manually. I'm going to create a system restore point and install some more software. I read somewhere that Kaspersky Internet Security can cause your PC to crash and hang, so it could be that too. 

Interesting little update, as I was installing Kaspersky Internet Security 2014 (using the disc) it warned me that the software is not compatadible with my current operating system. I managed to install it on Windows 8.1 prior to installing Update 1 via Windows Update. Perhaps this is what was causing all the trouble in the background. I'm going to see if I can't download the installer.

Then take a diffrent virusscanner, kaspersky sucks anyway. take Avast Free. or Avira Antivir.

wenn it passes all the bench and stress tests, then it seems like a software issue. But only thing to know is just keep trying.  if it not crashed anymore without kaspersky then it could be the issue.

psu / cpu will not be the issue.

Well, I've been using Kaspersky Internet Security for over 3 years now on my personal computer. It has never failed me or slowed my system down. In the past I alternated between using that and ESET NOD32/Smart Security. At the end of the day, the best anti-virus software is common sense. I'm running a custom Prime95 blend test, which I started about 2 hours ago. And I plan to keep this running for another 22 hours. In my mind, if the system passes that then it is stable. So far all the voltages look good, the CPU temp doesn't go above 52 degrees celcius and I'm using about 98% of the computer's RAM. I'm starting to think the Memtest86+ error I got after 12 hours was a fluke or causes by bad timings or something. I'm not going to use my computer to land a spacecraft on mars and 24 hours of prime blend is probably the hardest it will ever work.

 

and what about passmark test, Unique Valley Benchmark test, and Aida 64 ? you allready tried all those tests? passmark, can test your complete system cpu gpu etc etc

I forgot about those! It's always good practice to run multiple tests. Okay, let's summarise what tests I'm going to run:

To test the memory settings as they are now:

  • Memtest86+ (8 Hours) (I've already ran one that was 13 hours, got a CPU cache error but it was a really hot day and I ran another one on similar RAM settings to the ones I have now for 10 hours with no errors)
  • Windows Memory Tool (10 Passes - Extended Test)
  • 24 Hours of Prime95 Blend (currently busy with this one, 2.5 hours in and no errors so far)
  • AIDA64 (8 Hours)

If memory survives all of that, do you think it'll be reasonable to assume it is stable?

For the CPU I had the following in mind:

  • Prime95 - 8 Hours of Small FFTs (I've already done 7.5 on a really hot day with no errors)
  • Prime95 - 2 Hours of In-place large FFTs (Just seems like a good exercise in damaging the CPU if you let it run at 60 degrees of more than 2 hours)
  • AID64 - Stress the CPU, Cache and FPU for a few hours.
  • Prime95 Custom Blend (24 Hours, busy with that now).
  • 20 Passes of IBT Extreme

I'm not going to test the graphics card, as I've already had it for about a year and apart from the screen sometimes not waking up from sleep I've never had any trouble with it.

One last question, aren't you supposed to get errors with Non-ECC memory? They're not designed to check errors or anything, so obviously they will get it wrong sometimes by design, won't they?