Asking for help regarding ryzen freeze

Okay, Here it goes. Some information may not be relevant but I am trying to put whatever happened in last 7 ish days.

My PC config is as follows:

  1. AMD Ryzen 5 2600 (Wraith Stealth cooler)
  2. ASUS TUF B450m Plus Gaming
  3. Patriot Signature line 32GB(16x2) 2666MHz DDR4 RAM (PSD416G2666Kh)
    (NOT in QVL RAM list)
  4. Asus Turbo GTX1070 Ti graphics card
  5. 240 GB sata SSD(Transcend) (boot drive)
  6. 1TB HDD(Toshiba)
  7. KRPW-L5-600W/80+ psu
  8. Cooler Master Masterbox E300L.

Assembled this system about one and a half month ago. I initially set the Asus optimized setting which sets cpu multiplier to 38.5 and vddcore to 1.287(I think). It was running all good. I tried different overclocking settings like setting multiplier to 40.00 and setting vddcore to 1.3V, once tried increasing RAM voltage to 1.3 from 1.2 and faced random freezes and tried running cinebench. After trying such things, I settled down at 1.2V RAM(all settings for RAM were AUTO) and 40.00 multiplier for cpu and 1.287V for CPU. Now my purpose of buying this PC is gaming+deep learning. The system was running fine, no freezes during games nothing at all. Just one problem, I could never overclock my Graphics card in windows. I tried Asus GPU Tweak II and MSI Afterburner but max clock would always be 1607MHz. I was okay with that. I kept my bios up to date. Before the nightmares started, I was running bios revision 601 (AGESA 1006) and it was working just fine.

Then about a week ago, I tried Tensorflow object detection API training in ubuntu 18.04 (kernel 4.15.0-43, cuda 9, driver nvidia-410). This usually utilizes all 12 threads at 40-60% load per core and takes up nearly all of graphics memory. I have used tensorflow before in this system for smaller tasks and faced no problems at all. But this time, the system froze up after may be an hour. Then I dialed down my clock speed even set it to Asus optimized and the basic settings which basically puts everything to Auto. I just crazily searched in various forums for fixes. I thought it might be non QVL RAM issue. So I tried tuning the RAM timings. Once saw in a video that a guy replaces Auto with whatever value was showing in the BIOS RAM timing. Result was same. Tried disabling C6 state and Idle power policy but no luck. Tried BIOS version 409,601 and 604 but results are the same . The CPU would sometimes run happily for hours but sometimes wont even boot the system. Sometimes multiple crashes would occur in both windows and linux. I have tried fresh windows and linux installations, switched from ubuntu 18.04 to 18.10 and then back to 16.04 but no luck.
So far after running many stress tests and such, I have found some info which might be useful.

  1. At full load, CPU temp hits 100 C. But that doesn’t crash the system. System would crash even when at below 50 C sometimes.
  2. Memtest86+ did two full passes without errors and then froze on third.
  3. Aida64 CPU and GPU stress can run without issues but RAM stress immediately freezes system. Other RAM stress testing tools like techpowerup memtest64 works fine. Prime95 torture test with 25000 MB RAM allocated runs for 4-5 mins without any problem(did not run more than that, faced one freeze).
  4. After this happened, I actually found that my motherboard had 8 pin power connector for cpu but I only attached one 4 pin connector to it. Which ran the system for a month without issues. I first thought my power supply had only 1 4 pin connector but later discovered it had 2. Then connected that but no luck.
  5. Monitored rail voltages both from BIOS and from Asus AI suite 3. 12V rail voltage is always below 11.9V and goes down to 11.575V for combined CPU and GPU load.
  6. I tried to play COD infinite warfare tonight but system froze up. Tried again and same result.

Now I don’t know what went wrong and when. You would probably need more information from me regarding this issue. Please let me know what to do and what to run so that you could help me.

Seems like a faulty memory stick or perhaps misconfiguration in BIOS.

If ram fails on bios defaults there’s usually a dead module and should check with the store where you got this from. Testing each stick separately might reveal an error faster than testing a kit. I witnessed this on a few Corsair kits: XMS DDR2 800, XMS DDR3 2000, LPX DDR 3000.

I had a LPX kit not on QVL list on a GB AB350 Gaming 3 board, until F20 BIOS I had to run it at 2933mhz on auto settings but then it ran at full 3000mhz after the update. I don’t find QVL useful in particular, I’d rather stick to stuff released kind of at the same time as the board or a little later.

For keeping the CPU cool I use Noctua NH-D15 SE-AM4 and 4 140mm case fans, also from Noctua. The power supply, idk about that brand …

Thanks a ton for replying. The power supply is chinese made but model name suggests that its sold under a japanese brand.
but is the 12 v rail going down that much is a normal thing(11.575)?

I removed one of the sticks and ran some prime95 stress test on windows 10 (build 17763). prime95 settings: number of threads 12, max FFT 4095, 8192 and ram was set to 12500MB. tried this on both memory sticks, one after another. They stayed up to 3-4 mins before freezing again. CPU frequency was limited to 3.8GHz so it went up to 3.76 and max temp was 70 C before failing. 12V voltage monitor reported 11.772V.
I also ran Furmark for GPU(1080p FHD, no AA), 12V rail went down to 11.575V. But it ran for 10 mins and I turned it off.