[Solution Found] I NEED HELP! 3960X Seems to never be really stable!

In that case, sell it as “partially defective”. Any money you get for it is more then you get when binning it, right? :wink:

1 Like

Oh yeah that I can do and will do, why didn’t I think of that? Dang it since this PC got so unstable, and my mental health has gone down with it, thinking seems to be not my strong suit anymore…
Thank you :slight_smile:

OK, back to the issue at hand. I found a few links, no idea if you’re already familiar with them, that offer solutions to the problems you described. As I’m not running Win-10 (I’m not running Win-OS at all :stuck_out_tongue: ) I never tried these, so YMMV:

https://www.exefiles.com/en/sys/vmx86-sys/

HTH!

1 Like

Thank you, yes I’m familiar with them and from what my hardware swapping “told” me now it seems it was the RAM, again.
I’m not encountering the NMI_HARDWARE_FAILURE anymore, fingers crossed that it stays that way.

The new BSOD, which happened once, for now that is, also seems to be very specific to VMWare, finally something that I can pinpoint without trying to swap all hardware components.

Update: That BSOD I had was some thing with VMWare Workstation 16, for some reason it croaks when I have more than 5 VMs running at the same time. And since I started number 5 and 6 at the same time it seems to have caused a BSOD.

Yay software that is stupid…

I have replaced the Nvidia 1070 with my RX580 again and it seems to function very well at the moment, about 2 weeks now. Fingers crossed and wish me luck that it stays that way.

Sooo, I just got another NMI_HARDWARE_FAILURE BSOD just as I wanted to get back into playing Apex Legends I just started playing one round and right in the middle of it it happened.

I used the PC now relativly extensively and played some other games without issue.

@wendell would you suggest that it would be good idea to RMA the board now? How is ASUS’ RMA?

Also may this have anything to do with this?

maybe a good idea to rma?

2 Likes

Ok I’ll get the rma process on the way then, hopefully it’s the board and the next one will work.
My first rma, ever, exciting XD
Anything which I would need to know?

Ok so I talked to ASUS and they think that it might be DIMM slots themself that are causing the errors, it seems to be not that uncommon to happen it seems? Did anyone ever hear fo that?

Well I’ll be getting the Alpha version of the board as the non-Alpha variant isn’t produced anymore. Since I’m still in warranty I had to do the exchange through the store I bought it from.

4 Likes

Well the waiting begins, the Alpha is now in service.

First things I noticed:

  • It seems my SATA devices are now recognized after a OS restart.

  • Voltage mangament/readout is way worse, like 0.06 difference for DRAM, Have to set 1.26 to get 1.2 in software but is actually 1.24 when measured with a multimeter.

  • Chipset gets hotter, about ~7°C from about ~80°C to ~87°C in idle.
    I wonder is there any room upwards at all?
    Oh that is with the fan running at about ~4000RPM.
    May have to try to repaste the thermalpaste.

  • Whatever Temp3 is in HWiNFO64 is now about 74-78°C instead of 47°C.

Welp it’s not the motherboard then, I just had another NMI_HARDWARE_FAILURE BSOD again while playing Apex for about 4 minutes.
This is so infuriating…
I played Apex a few days now and all was well and then suddenly it starts up again… Why is this proble so terribly inconsistent???

BUT the good(?) side is that the last two BSODs only happened under load.

Could it be the CPU?
Cus that is the only thing other than the mainboard I changed when switching to this platform.

Things I noticed:

  • The BSOD happens again at random, no real load peaks in the game or in the background or anything.

  • The memory dump cannot be fully created, it stops at 0% and the file is just 3.11MB big, this has been the case for every NMI_HARDWARE_FAILURE BSOD. BUT it does restart now, this didn’t happen before.

  • Both times it happened just a few minutes in while playing Apex Legends, then the BSOD happened, restarted and I could play the whole rest of the day without issues.

  • Temps are high but in spec:

  • CPU: somewhere in the 60s°C

  • GPU: Core: early 80s°C, HS: late 80s°C, MEM: somewhere in the 90s°C

  • Chipset: mostly steady 87°C

  • Memory: The highest is 60-61°C all others are below

  • PSU takes about 850-900W from the wall, so should have enough for peak spikes

have you tried dram calculator for ryzen?.

i had been suffering random bsods maybe 1’s every 2-3 months i would get a cluster of em. i took to running sfc /scannow and dism.
every time sfc would find corrupt components and id get a few more months of stability.

so i tried the above application.
i ran the mem bench and it told me i was getting read write errors. (i passed memtest on a 24hour test)… but the dram calc bench i failed first run. (so run that before doing any changes)

anyhoo after some googling to see what kind of die and the chip layout (single rank in my case)
reddit ram guide
try above to see if your ram is listed for the info you will need to enter.

i just ran the safe settings. took a photo and entered everything on the first page of the application output into the ram timings in the eufi.

so far ive seen a uplift in ram performance with a latency drop according to userbenchmark.
i also passed its mem test np. and now its just a case of wait and see to see if my problems are fixed.
but so far 3 weeks on, no read write errors in the o.s and no corrupt components.

2 Likes

Hello and thank you for your suggestion :slight_smile:
I’m running my RAM at JEDEC 3200 so that HAS to work, the RAM is validated by the manufacturer, in this case Samsung, to work at those timings, in this case CL22 and so on.
There is no OC going on, no XMP (which is also an OC).

But you do give me the idea that I may still need to check if I have memory erros, I haven’t run memtest yet on these DIMMs.

technically if fully populated this is not true, 2933 is the max the threadripper will support…

3 Likes

Uhm… I only use 4 but DR would that be the case then too?

Yes

No I don’t no worries, I have 16GB DIMMs with chips on both sides, they are dual rank and I have quad channel :slight_smile:

1 Like

3200 cas 22? thats some real loose timings mate.
typically for 3200 your looking at cas 14-16.
and even without xmp or a mem profile set at 2133 the cas timings wouldn’t normally be set to 22.

can you post a screenie of your jdec from cpu-z? both the memory and spd (with an active channel selected) tabs?

CAS 22 is a JEDEC specification for 3200 that is what some ECC and normal “non-tuned” DIMMs chip as.
As I suck at explaining things I will refer you to Wikipedia, check the table further down on the right DDR4-3200AA is the standard name for it.

Mine are 4 of these. And a Geizhals link.