Okay, Here it goes. Some information may not be relevant but I am trying to put whatever happened in last 7 ish days.
My PC config is as follows:
- AMD Ryzen 5 2600 (Wraith Stealth cooler)
- ASUS TUF B450m Plus Gaming
- Patriot Signature line 32GB(16x2) 2666MHz DDR4 RAM (PSD416G2666Kh)
(NOT in QVL RAM list) - Asus Turbo GTX1070 Ti graphics card
- 240 GB sata SSD(Transcend) (boot drive)
- 1TB HDD(Toshiba)
- KRPW-L5-600W/80+ psu
- Cooler Master Masterbox E300L.
Assembled this system about one and a half month ago. I initially set the Asus optimized setting which sets cpu multiplier to 38.5 and vddcore to 1.287(I think). It was running all good. I tried different overclocking settings like setting multiplier to 40.00 and setting vddcore to 1.3V, once tried increasing RAM voltage to 1.3 from 1.2 and faced random freezes and tried running cinebench. After trying such things, I settled down at 1.2V RAM(all settings for RAM were AUTO) and 40.00 multiplier for cpu and 1.287V for CPU. Now my purpose of buying this PC is gaming+deep learning. The system was running fine, no freezes during games nothing at all. Just one problem, I could never overclock my Graphics card in windows. I tried Asus GPU Tweak II and MSI Afterburner but max clock would always be 1607MHz. I was okay with that. I kept my bios up to date. Before the nightmares started, I was running bios revision 601 (AGESA 1006) and it was working just fine.
Then about a week ago, I tried Tensorflow object detection API training in ubuntu 18.04 (kernel 4.15.0-43, cuda 9, driver nvidia-410). This usually utilizes all 12 threads at 40-60% load per core and takes up nearly all of graphics memory. I have used tensorflow before in this system for smaller tasks and faced no problems at all. But this time, the system froze up after may be an hour. Then I dialed down my clock speed even set it to Asus optimized and the basic settings which basically puts everything to Auto. I just crazily searched in various forums for fixes. I thought it might be non QVL RAM issue. So I tried tuning the RAM timings. Once saw in a video that a guy replaces Auto with whatever value was showing in the BIOS RAM timing. Result was same. Tried disabling C6 state and Idle power policy but no luck. Tried BIOS version 409,601 and 604 but results are the same . The CPU would sometimes run happily for hours but sometimes wont even boot the system. Sometimes multiple crashes would occur in both windows and linux. I have tried fresh windows and linux installations, switched from ubuntu 18.04 to 18.10 and then back to 16.04 but no luck.
So far after running many stress tests and such, I have found some info which might be useful.
- At full load, CPU temp hits 100 C. But that doesn’t crash the system. System would crash even when at below 50 C sometimes.
- Memtest86+ did two full passes without errors and then froze on third.
- Aida64 CPU and GPU stress can run without issues but RAM stress immediately freezes system. Other RAM stress testing tools like techpowerup memtest64 works fine. Prime95 torture test with 25000 MB RAM allocated runs for 4-5 mins without any problem(did not run more than that, faced one freeze).
- After this happened, I actually found that my motherboard had 8 pin power connector for cpu but I only attached one 4 pin connector to it. Which ran the system for a month without issues. I first thought my power supply had only 1 4 pin connector but later discovered it had 2. Then connected that but no luck.
- Monitored rail voltages both from BIOS and from Asus AI suite 3. 12V rail voltage is always below 11.9V and goes down to 11.575V for combined CPU and GPU load.
- I tried to play COD infinite warfare tonight but system froze up. Tried again and same result.
Now I don’t know what went wrong and when. You would probably need more information from me regarding this issue. Please let me know what to do and what to run so that you could help me.