AMD 3960X System Random Reboots

Hello everyone, after much troubleshooting over the past several weeks I am hoping someone can help. I have a new system that I built back at the beginning of March, after running for about a month I started to have issues with random reboots. All these reboots happen at idle or near idle conditions and have not happened at all during strenuous work. My system has no issues when running cpu-z stress test, working with 8k video footage in Premiere, or gaming. I constantly get the infamous Windows Kernel-Power Event ID 41 with no bug check code. Here are some things I have done to troubleshoot:

Updated to latest chipset drivers from AMD site. Installed newest motherboard BIOS. Installed all Windows updates. Updated to latest Nvidia Studio Driver (clean installed). I have also checked my ram using Memtest 86 and after running multiple times (once for 4 passes over 25 hrs) I have no errors. I have also tried all the Windows fixes for this issue (disable fast startup, power plans, etc). I also purchased two different UPS units and verified that my system isn’t loosing power from the wall. Nothing is overheating, even under full load my CPU maxes out at around ~83C and stays stable there even when stress testing. Idle temp never goes beyond about 50C. I contacted AMD and RMA’d my CPU, I have also RMA’d my motherboard through Asus and after a few days I started having the same issue with the new parts as well. I have also tested my known-good old Nvidia GTX760 GPU, GTX1070 from my old system, and newly purchased RTX3060 and had the same results with all of them.

Two motherboards, two processors, and 3 graphic cards and I still have the same issue. I don’t have another PSU to test but my newly purchased Corsair HX1000 powers another system with no issue (although that system isn’t as power hungry). Even though I am not convinced that the PSU is an issue I do have another one on order to try. What am I missing here? I am not new to building system and have built dozens in the past 15 years… I have never seen anything like this. I have never been so frustrated… this issue seems to be power delivery related but at this point I don’t know what else to try. One last bit of info, other than enabling the XMP profile on my memory I haven’t done much in the bios. I am running memory at 3200 mhz with a 1:1 fclk ratio. I am not overclocking other than the memory XMP profile or using PBO… my full system specs are below. Thanks!

AMD 3960X
Asus TRX-40 Pro
Corsair Vengence LPX 64GB 3200 (4x 16GB sticks) CMK32GX4M2B3200C16W (on mobo QVL list)
Fractal Design S36 AIO
Corsair HX1000
2x Samsung 980 Pro NVMe’s
2x WD Caviar Black HDD’s

1 Like

if your bios/eufi has the option and your ram cas latency is set to an odd value
try disabling gear down mode.
ryzen chips didn’t like gear down mode enabled when using odd value cas timings.

if your using even timings on the other hand enable it for added stability.

as your running xmp manually set the cas latency to 16 and manually set the MT/s to 32 or 3200 rather than leaving both on auto if you left them as such.

im assuming xmp increases your ram volts by default?
if so leave it as set. if not, manually set the d-ram volts to the recommended value for running xmp on that make and model.

i would also suggest adjusting your c-states, but your psu is new enough to be compliant with the lower power states ryzen supports.
so try enabling/disabling ErP as that will enable/disable some low power-states for other parts of the system.

enable iommu if you have svm enabled. (virtual machine support)
if you have the option, enable hpet (high precision event timer.)
but its an asus board so you may not have the option
and it should already be enabled in windows for you.
(you could try disabling it via device manager and cmd but it may add more instability if your more of a productivity user than a gamer)

as you have the amd chipset drivers and powerplan installed.
tweak it rather than they windows power plan.

good luck.

1 Like

did you get the firmware update for these? I was having a Bad Time till the firmware update.

1 Like

I updated a few weeks ago, assuming that there are no updates since then I should be set. I will double check again… I have Samsung Magician installed.

2 Likes

I will double check the cas latency, I believe XMP set it to the tested latency of this kit (16-18-18-36). I am certain that voltage is at tested voltage of 1.35v.

I have been reading about c-states and have noticed that many users have issues with them enabled. While this model PSU should be fine with lower power states I wonder if my specific unit has an issue? This is why I decided to purchase a new one. This is the first time I have heard about ErP so I will definitively check that out!

As far as the power plans I actually started with the Windows power plan options (balanced and high performance). After having issues and tweaking the Windows plans I reset the Windows plans and started using Ryzen High Performance. I had no issues for a couple weeks and then the same issues returned again.

Edit Looks like XMP did indeed set ram to 16-18-18-36. I will see what happens if I change the frequency and latency to 3200 and 16 manually.

2 Likes

yep if cas# latency is set to auto manually set it to 16
just the cas latency, for now
mem frequency also set to manual and enter 3200.
your volts set to 1.35 even thought hey are exactly the same as auto.

it should add some stability as the bios isnt looking up a database for a value that may change depending how close your bclk setting stays to its 100 timing at boot

you could if you like swap everything you see set to auto to the manual values. on that page.
bclk adjustment may help as the timing crystals on the motherboard can be +/- 0.05 over or under 100 at boot time which could affect other system parameters that are also reliant on it such as ram and pci-e (you might want to set that manually to 100 to)
pretty much anything that can be affected by the bclk… set to manual and its default next to it.

i found with gigabyte x58 boards it helped with stability if i went through all the auto timings for the ram, and set them to the defaults listed beside them, so may help here.

it wont make the system oc any better or worse than it already does but its not having to look up auto settings when manual is entered, and did seem to increase system stability for me.

2 Likes

Thank you, that explanation makes sense. I am going to take all the values off auto and set them accordingly.

I have done a check disk and file system repair on my boot drive but haven’t reinstalled Windows. I have tried to avoid a Windows reinstall because of all the time it will take to get my software set up again… it doesn’t feel like a software issue but do you think a Windows reinstall is in order?

1 Like

open an elevated CMD

run
sfc /scannow
if theres any errors it should repair them.

also run dism
DISM /Online /Cleanup-Image /CheckHealth
will quick check your file system for errors.

DISM /Online /Cleanup-Image /ScanHealth
a more in depth scan

DISM /Online /Cleanup-Image /RestoreHealth
will restore any broken files that one of the other 2 scans find.

if it says your system is ok. then your pretty much good to go.

–
a couple of things you can do to increase windows stability.
install the vcredist libraries from 2008 to present.
same with direct x if your a gamer.
install dx9’c, dx 10 june, dx 11, and use the dx 12 web installer.
once done run dxdiag and your system should register all the changes.

reboot and hope your done.

1 Like

Your issue is very common on the AMD Ryzen OCN forums. Caused by using the FMax enhancer and the curve optimizer negative offset.

You are just starving a core of too little voltage and causing the reboot.

You need to turn off the extra boost functions.

If your BIOS has the setting, turn off the power supply idle voltage control, or at least take it off low setting. The power supply under low load will actually drop phases and also voltages to be more green. Not what you want to keep the cpu core voltages up.

Turn off DF-C_states in the BIOS.

Get rid of any of the old Ryzen power profiles in Windows. Not needed on Zen 2 or 3.

Do a search on “idle reboots” and you will find lots of suggestions to fix the issue.

1 Like

What power plan do you suggest I use, Windows High Performance?

1 Like

None. Just let the cpu handle its own boosting. It is smarter than Windows anyway.

1 Like

I ran all the scans and restarted no errors were found. The last option told me that the restore operation was completed successfully. Even with my DRAM DOCP profile enabled cas latency is set to 16 and not auto, voltage is at 1.35. Gear down mode is on auto.

1 Like

I can’t find an option for power supply idle voltage control in my BIOS. From what I have read before about other boards this was removed in later BIOS versions. I did find an option to disable global c-states though. I did notice that when I disabled global c-states my CPU didn’t seem to boost the way it did before. Before task manager indicated speeds up to 4.2ghz or so, with c-states disabled I seem to remain at the base 3.8ghz frequency. I assume this is because all cores are locked at the same frequency? I understand that disabling c-states works for many users but I would rather not affect my boost clocks if I can help it.

1 Like

DF C_states is very different than cpu C_states. DF C_states is for controlling the Infinity Fabric clocks. You want to set them permanently at P0 power state and fixed clocks.

That is one of the prime reasons for idle load reboots.

Ignore the PBO crap that never worked correctly. Just use Performance Boost enabled and choose one of the Levels.

I find I can get better all-core clocks by just choosing a fixed multiplier and a set Vcore. That way I control the power used and the temps. The auto boost algorithms are great for a single core boost for a game or whatever, but if you are running a TR or Epyc, your usage is likely as server or workstation and you want to use all those cores at the highest sustainable and stable OC clocks.

1 Like

Disabling cpu C-states shouldn’t have done anything to knock down your clocks. It only prevents the cpu from dropping out of P0 power state into one of the lower power levels for power saving.

If your system isn’t used at full capacity all the time, then go ahead and enable global C-states again so you get power savings. But you may run into idle power reboots again.

You don’t want to drop your Vcpu voltages too low for the idled cpu cores or you will throw a machine check exception, Event 41 and probably reboot.

1 Like

Some of the people resolved this issue by reverting to an older BIOS that didn’t have the auto boosting improvements that cause the idle Vcore voltages to get too low.

1 Like

Okay I will take a look and see if I can find an option for DF C_states, I don’t recall seeing it when I was looking in the BIOS earlier. It might still be worth it to experiment by disabling cpu C_states just to see if I still get random reboots. I still don’t fully understand why everyone using this platform doesn’t have to go through such changes to get things working in a stable fashion and others do. Perhaps it’s just the diverse variety of hardware.

1 Like

I have had to fight my Asrock X399 board with getting the memory to OC with any great success. The cpu was easy. Pick a target clock and enough Vcpu to keep it running under 24/7 load.

The Epyc motherboard was my first foray into server hardware and I had a few missteps at first but the nature that server cpus and hardware are locked down basically, really simplified things for me.

Set XMP on the memory and bump the cTDP and PPT to the max and be done with it. I did play around with the NUMA options trying for the best bandwidth and latency in benchmarks, tweaked a few things like the DF C_states and was basically done with it.

There are some really nice Epyc/Threadripper tuning guides available that AMD publishes. Explains every option in the BIOS in great detail. Just pick the use case you want to run the workstation/server in and follow the AMD recipe. Bing-bang done deal.

1 Like

You need to really dig around in the AMD CBS and AMD PBS menus to find all the not normally exposed options and parameters.

The basic menus are just for the out of the box, don’t understand anything types.

1 Like

I have a question for you Wendell, I noticed in an old thread you mentioned that the power supply you were using caused instability at idle because it didn’t support the Ryzen lower power states very well. I know that my Corsair HX1000 should support these states but is it possible that it is having issues with them? Could a new PSU make a difference?

1 Like