Back to Basics - how do you nail down intermittent instability

tl;dr What are the best options, or your preferences, for stress testing a system?

Relevant Specs:
i7-6700
GTX 1070
12Gb DDR4 2400
Windows10 or any live linux environment

I built this computer in 2017. Started as an AIO and got a full custom watercooling loop. I’ve since upgraded (got the deal of a lifetime on a second hand computer) but I still use this system on my workbench and I’ve been facing some intermittent issues that I’ve never been able to replicate reliably enough to address.

Two games that I’ve played have exhibited these problems:

  • FFXIV - severe slowdown after extended gameplay sessions (single digit framerate) Adding a fan to cool the motherboard mostly alleviated this problem. Game was stable on new computer.

  • Remnant: from the ashes - Crashing during multiplayer that takes down me and, occasionally, everyone else in the party. Work in progress. Game is stable on new computer.

Where I’m stuck is I’ve never been able to replicate these issues in any of the artificial stress tests that I’m familiar with, which are prime95, furmark, and occp. Hours and hours of one or two running and temps are frosty and nary an error in sight.

My best guess is that there isn’t enough airflow over the chipset, and some games hit that harder than artificial benchmarks do. This computer doesn’t report chipset temps though so I can’t confirm that.

I’d love to hear about your processes for stress testing a system that exhibits intermittent stability issues. I’d really like to be able to reproduce these errors before I make any more hardware changes.

Heat is an issue here but probably not on a monitored component.
Bench testing a PC, software only monitors the equipment that has temp sensors.
Flir cameras can pinpoint the hot spots.

But if I were to hazard a quick guess I would use a laser thermometer and check the memory While the system is acting up.

1 Like

given you’ve got 12gb of ram and the 6700 is only 4 slots, it is a mis matched set. i’d start with memory stress tests.

you may be seeing errors only when the last 4gb (or 8gb) is consumed.

2 Likes

Memtest finished 4 passes with 0 errors. I might need to find a way to test the memory while the rest of the system is under load.

try running superpi 1m and 32m tests.
if your ram is stable it will pass with about 10 seconds and well however long 32m takes.
if it locks the system up your ram is the issue.

super pi mod will test all cores while 1.1 is for testing on a single core.
run em both if you want. but either will do.

this is a quick and dirty test. so take it as is.

1 Like

Somehow missed that, after seeing the 12 GB of RAM.

What sort of cooling do you actually have; how’s your case ventilation, and how is it configured?

You may find that if your 1070 is dumping a heap of heat into the case (unless its a blower cooler), it is causing the other components to throttle if your case ventilation is insufficient.

I just resolved a cooling problem in my PC yesterday (too lazy to do it earlier).

I wasn’t seeing severe throttling (I’ve got too many cooling fans for that), but I was getting a lot of noise from my AIO fans. You may be seeing some sort of similar but more extreme example of my issue.

My specific setup (pre-change to fix it):

  • 6900XT Red devil on air
  • R7-2700X on water (corsair 280mm AIO)
  • define R6 case with 2x 140mm front fans and a 140mm exhaust in back

My CPU rad was top mounted and acting as exhaust.

What was happening? The 6900XT was dumping heat into the case with the super efficient cooler the red devil has. This was then causing the radiator for the CPU (acting as top exhaust) to heat-soak and run its fans hard.

Resolution?

Stick the CPU radiator in the front as intake, relocate the front fans to the top for exhaust.

CPU barely heats the 280mm AIO so it spins quiet, which means the GPU is free to dump as much heat into the case as it likes behind it, which is then exhausted by 2x 140mm top fans and a 140mm rear fan.

Same hardware, same case, vastly different results. Went from noisy while gaming and some throttling to virtually silent (apart from coil whine, :rofl:)

Original setup was totally fine with my previous Vega 64 reference blower cooler, as that was dumping heat OUT the rear of the case, not into it, but the 6900XT is a triple fan setup that doesn’t exhaust its heat by itself.

1 Like

What are the PCH and CPU MOSFET temps like? If cooling the motherboard seemed to help then those are the first things I would look at. Also what specific motherboard you have would also be extremely relevant now as well.

1 Like

Chances are good that I’m in a very similar situation. This computer originally used a large laptop-style blower to cool the cpu, which would have resulted in a ton of air being drawn over the motherboard.

Here’s some pictures I have of what’s going on cooling wise. The 40mm noctuas next to the gpu are the fans I mentioned adding.

It’s a very proprietary motherboard made by lenovo. No temp sensors for chipsets or mosfets, unfortunately.

A big part of why I’m looking for a synthetic load that can reliably reproduce my instability is so I can verify that it’s fixed after any hardware changes. Identifying the specific component causing the issue will be challenging.

This is perfect! 32m tests crashes around the halfway point with furmark running simultaneously.

2 Likes

Ran a series of tests this morning with different ram configurations. Currently working correctly with the sticks swapped. I ended up struggling a bit with getting the ram properly seated due to limited accessibility. Need to spend some time playing the offending games (for science) but I see two possibilities:

  • I installed the ram in an incompatible configuration the last time I had this apart, which was years ago at this point.

  • The memory just needed to be reseated and was fine as it was.

Thanks everyone!

3 Likes