[SOLVED] PC unstable, freezing, unknown cause

Fifth night would also be a success.

I guess the slightly lower FCLK with the default SOC voltage did something plus running RAM now on 1.35V.
With an uptime of about 50+H and more than 11+H continues encoding tasks, it seems stable.

1 Like

So, fun thing happened, increasing SOC voltage actually increases system instability. I tried two things to find that out.

  • Increase LLC level
    or
  • Increase voltage to 1.15V

In the case of LLC (Level 2) it actually freezes up at the start of Windows when showing the desktop.
In case of SOC voltage it now freezes in the day usage.

Can it kinda be that I got a bum CPU?
Because even at stock it froze btw., that’s why I even started this whole thing.

Voltage takes priority over LLC, because more LLC means more current. Too much current will have negative electromigration issues on the IO die.

There’s a chance Auto voltages may have been too high at one point, degrading the CPU. If SoC isn’t manually set, and you have a buggy BIOS, this can most definitely happen.

I didn’t try both at the same time, only one at a time.
Or am I misunderstanding something?

You don’t want to have too much current on the SoC rail, so increasing LLC could overvolt slightly in the worst case scenario. You want a tiny bit of droop for SoC.

Ofc that’s why the highest I went with was level 2 and not level 1.
And since that instantly froze at the start of showing the desktop I defaulted it back again, the voltage was afaik never above 1.12 before that.
I updated the BIOS to the newest version as soon as I got it.

But the wonderfull part I kinda don’t understand, if it is degraded, why is a lower voltage increasing the stability, shouldn’t it need even more voltage at that point?

SoC voltages are funny. They like specific LLC profiles and specific amounts of voltage droop. There’s no “one chip does this so the next chip does the same” thing. I had a poor IMC on my 4960X so I knew I had to raise the voltage while loosening the LLC (adding more vdroop)

It’s not as straightforward, but you must be confined within voltage limits or you risk severe degradation.

1 Like

That is certainly interesting. So LLC is almost as fickle as RAM on high OCs.
So if I want to find the right SOC voltage I also need to find the specific LLC level that goes with the new voltage, obviously SOC voltage of 1.15V and default LLC is still unstable in my case.
I’ve gone back to auto voltage and auto LLC, for now it seems the most stable out of all the scenarios I’ve had, lets see how it goes.
Kinda hope that’s it though and I now have a stable system.

Experiment with it, just don’t have an LLC that INCREASES voltage, only droops.

I hope I don’t have to, as it takes quite a bit of time now for it to freeze up.

So after quite a bit of fiddling I found something out.
Memory got unstable as soon as a heavier load than gaming was applied to the CPU, in my case encoding HEVC videos.
That happens on every possible setting for memory from 2133 to 3200.
HCI memtest would throw an error as soon as ~10%.

So I remembered something what I did on my old Threadripper that I never did here: Set the CPU to a specific frequency and apply manual voltage.

Alas suddenly no more error (for now up to ~80%) while running encoding task + HCI memtest (stress testing).

Whatever happens when the CPU is set to fully auto cripples everything on my system it seems.
I will run the same thing now over night leaving it run to 2000% + encoding task hoping that was now rly the cause.
Is it normal now on Zen 2 that the Core VID doesn’t go above 1.1V when set to manual VCore? It seems to run at 4.2GHz just fine, I see no real performance decrease.

It will just be a fixed all core OC, and Zen 2 likes dynamic voltages for better single core performance.

Try something else before resorting to an all core OC.

I tried pretty much anything that I could think of hence the start of this thread, then tried everything mentioned here.
I know Zen 2 likes that, though I never got over 4.2GHz anyway, which is why I settled on it for now and try it with that now.

I daily my Ram at 1.4 I don’t think damage occurs until past 1.5

1 Like

how close are power cables to the ram, wendell or Jay once mentioned instability when a 24 pin cable was too close to ram

I’d say the nearest power connector is about 3cm away and no power cable in the immediate vicinity. Could you find that video maybe? Would be interesting to know the context.
Yea but normally you wouldn’t need more than what XMP specifies, I did however run it for a test on 1.38V but still errored.

That was with extremely high currents like X299 and 18 cores or 28 cores like the super extreme LGA 3678 platform.

yeah that would make sense and the guy’s problems are at idle with the least amount of amps pulled

@Hako try aiming a Stock AMD fan at the ram with the side off and see if it still does it

it if does then we eliminate B-being temp picky

Ehm… no my problem is not when the system is idle, but under load but I’m not sitting at it and/or doing stuff.

As for the fan, I wish I could do that, but the Be Quiet Cooler is actually so big and the case small enough that it is impossible to aim a fan there, and there are RAM sticks on both sides of the CPU so I would even need to do that with two fans.

Well no dice at all, seems this system will be unstable no matter what I try.

  • I tried just the XMP again with nothing else touched (Stock settings).
  • I tried leaving timings all on auto and only checking the frequencies (2866 - 3200)
  • Tried different voltages from 1.28V-1.38V
  • Tried static all core CPU clock (4050-4200) with decent voltage (1.2750-1.375)
  • Tried different drive strengths that are recommended.
  • Tried different SoC voltages (1.05-1.2)
  • Tried C-States on or off
  • Tried different LLCs for SoC and/or CPU
  • Tried with FCLK in 1:1 mode or 2:1
  • Probably forgot other things I tried, also all the combinations I tried would be ridiculous to list.

I did that now, thrice, and it is still unstable.

So funnily enough at one point (before starting this thread) I could get 1000% memtest clear with very hard timings, but froze when encoding stuff over long time, which is why I started this thread.

Now the highest I get, with the same settings, is about ~130%.
I don’t even know if it is the CPU or the RAM now.
I took the RAM with me from my previous Threadripper build and it ran flawlessly there on 3200MHz, but it only got to about ~55°C under full load there. So it is either really the heat or I just got the bummest CPU there is.