Build:
AMD TR 3970X
Asus Zenith II Extreme Alpha
G.Skill F4-3200C16Q2-256GTRS 8x32GB kit
Corsair HX1200i PSU
ASUS 2080 Ti (ROG-STRIX-RTX-2080TI-O11G)
I’ve had stability issues with this that I’ve been trying to figure out for the last 2-1/2 weeks. I’m calling uncle. I’m looking for some help. I’m not sure if its RAM, CPU, PSU, or motherboard that is causing my issues. I’m fairly confident I’ve got a couple bad sticks of RAM, but I think there is something else going on too.
I’m game to try any test that anyone thinks might determine what is causing the issues. I don’t have compatible spare parts lying around, but I might buy some ram or a different PSU to try that.
Here is what I’ve tried so far:
Memtest86+ V5.01 @ 3200Mhz 16-18-18-38 1.35V (MB setting). Resulted in ~ 8 errors in Test 9 over 2 passes
Memtest86 V8.3 Single core Mode @ 3200Mhz 16-18-18-38 1.35V (MB setting). Resulted in ~ 64 errors, 1 in Test 5, 63 in Test 7. Did not complete an entire pass (report attached)
Re-seated memory and blew out sockets with air, re-tried the above test. got errors in test 7 and stopped the test.
I saw the voltage to the ram was reading low compared to the set point. Setting @ 1.35 yielded ~1.325 on Channel AB and ~1.315 on Channel CD according to the motherboard. The motherboard has test points for the ram voltages, so I used a calibrated DMM to read the voltages. Read~1.32V for each channel. Set BIOS to 1.38V and re-checked the readings on the DMM. AB:~1.355V CD:~1.345V
Memtest86 V8.3 Multi core Mode @ 3200Mhz 16-18-18-38 Voltage from above via DMM. Resulted in errors in Test 7 and stopped the test.
Memtest86 V8.3 Multi core Mode @ 2666Mhz 20-19-19-43 (SPD) Voltage AB:1.176V CD:1.168V (motherboard reading). First pass OK, Second pass 8 errors in test 7. Stopped the pass during test 13. All errors were on core 26.
At this point, I had been in contact with G.Skill since it was looking like a RAM issue. They recommended the obvious, check each stick, are you running the latest BIOS, etc. Apparently I needed to hear the obvious because I hadn’t even check if there was a newer BIOS. Armed with the new BIOS (0807) I went back to testing…
Memtest86 V8.3 Multi-core. RAM at DOCP settings (3200 MHz 16-18-18-38 1.35V set-point) Ran just test 7 for 19 passes, got 68 errors at 2 different memory locations.
Memtest86 V8.3 Multi-core. RAM at DOCP settings, testing each stick individually with 32 passes of test 7 (refereed to by the last 2 digits of the serial number) Failed on 13 and 19, sticks 14, 15, 16, 17, 18, and 20 passed.
Sticks 14, 15, 16, 17, 18, and 20 tested with Memtest86 V8.3 Multi-core. RAM at DOCP settings, set for all tests 4 passes, locked up after several hours run time (found it that way several hours after it locked), no errors.
Sticks 14, 17, 18, and 20 tested with Memtest86 V8.3 Multi-core. RAM at DOCP settings, set for all test, 4 passes, Error in test 5 on the 2nd pass. Ran Prime95 (blend) and it dropped 2 of the workers after a few hours (don’t remember how long).
Sticks 17 and 20 Memtest86 V8.3 Multi-core. RAM at DOCP settings. Test 5 32 passes, no errors. Set for all tests and it locks up after ~6 hours run time, no errors. Prime95 (blend) passed OK running for ~20 hours. Windows extended memory test locks up as well (sits at 21% for 10 hours??). Ran stressapptest and it locks up at the “resume work threads for power spike”. Tried it in Ubuntu18.04 and the latest Mint, same result.
Repeated tests on stick 19 to check if errors persisted. Errors in test 7 again. Increased ram voltage so DMM read 1.35V (motherboard set-point 1.38V). Errors in test 7 again (in about the same amount of time).
Sticks 14 and 18 Memtest86 V8.3 Multi-core. RAM at DOCP settings. Set for all tests and it locks up. Tried it several times, and it would lock up at different tests. Ran Prime95 with small FFTs (4k-192k) and it will drop a worker in ~ 1hr.
Somewhere in the last few tests I put a Oscilloscope on the 12V and 5V lines to see noise and ripple. I don’t really have the right probes for high-frequency work, but I saw a pretty good amount of noise on both lines. And it would get noisier when under load. I can pull some images from the scope if someone wants.
So here I sit, wondering what to RMA? Should I send back the CPU? or is it a power delivery issue (how can you tell?). I could try re-seating the CPU, because that is quick and free. Maybe the RAM issues are a result of an underlying CPU or power delivery issues?
IMO you must try it. In several YT videos it was mentioned about the need of re-seating the CPU. And there is also a reason that we got the special screw driver with the CPU. Also while doing that make sure you keep the screw order.
I got memtest working without locking up. The log file had gotten pretty long from all the tests I’ve run, so I cleared it and now memtest works? Odd, but I’ll take it.
I did check the individual sticks, that took a looong time haha! I think I’ve gotten some of my issues figured out now. All for different reasons.
Updated Stressapptest to latest on github (1.0.9?) and now it runs fine. Ran 4 sticks with that for 8 hrs with no problems
clearing the log on memtest86 allowed it to run without instantly locking up, but it would lock up after a while. Log showed that it was spending time waiting for response from some of the threads and that it recommended running single thread because of it (maybe this is related to the P95 issue?) ran memtest86 in single thread mode for ~30hours on the 6 sticks that tested out good individually.
Prime95… I was the one with the Asus board that posted in your thread. I seem to have only one core drop (both workers on it). I need to do some more testing to confirm that is the case, run it longer with those workers stopped. I did run it for 4 or 5 hours and only had that one core drop. Maybe a bad core? I also want to try the re-seat just for the heck of it.
Definitely try reseating. That said I did have a 3970X with a single defective core that I eventually RMA’d. As for MemTest86 locking up, I had that too on another rig (MSI x99 motherboard). This is apparently due to a bug in some BIOSes. As long as it’s stable on a single thread then your sticks are fine.
@FranzB Do you know what AGESA version got the prime95 fix? The asus board shows “Update AGESA BIOS code to the latest PI 1.0.0.3 patch A” for the latest bios.
I think AGESA 1.0.0.3 B is indeed the one that fixes Prime95. However, as far as I understand (@DerAlbi please correct me if I’m wrong) that version doesn’t fix some VRM configuration issues (presumably) that affect the Aorus Master. They did at some point release a BIOS version that would fix that problem, but a BIOS version that fixes both issues has yet to be released.