Prime95 stable, but x264 on (FFmpeg) Linux crashes consistently. (4960x)

So I got a Prime95 stable 4960X OC and everything else is stable… except x264 (in FFmpeg) on Linux.

Somehow, Prime isn’t as stressful as x264 and oddly, using a default BCLK makes the stability worse. 10 seconds to a crash with a default BCLK, 6mins to a crash with a modified BCLK.

The only way to stabilize it is to “AVX offset” the multiplier by -1. I’m stumped… Prime is the most stressful program I know, and I even ran Linpack. Why the heck is Ivy Bridge-E being uncooperative with Linux x264? I literally tried everything, upping the VTT and VCCSA, upping to the max safe Vcore, nothing worked.

I’m at 1.35V at 44 x 100.1mhz. 43x100mhz at the same voltage was stable for x264, but I’m stumped on why x264 and why it specifically…


Edit:

A TEMPORARY ANSWER:

Vcore was low and 100.1 BCLK was causing instability. In fact, never change the BCLK from 100mhz for mission critical on Ivy Bridge-E.

Unfortunately, after a month, it still starts crashing.

1 Like

Have you tried running just ‘stress’ to see if it crashes?
Maybe the io scheduler also might play a role here.
What’s the niceseness of the ffmpeg processes? Default 0 i guess but worth checking out.

"Selecting the Instruction Set Extension
By default, Prime95 automatically selects the newest instruction set extension, such as AVX, AVX2, or even AVX-512. In order to change this behavior, Prime95 needs to be started and completely shut down again once. This will create the local.txt file. In it, exclusions are assigned a value of 0, whereas the code path that’s to be tested is assigned a value of 1. If you aren’t clear as to which SSE version is supported by your CPU, both can be set to 1. Prime95 will choose the correct fallback.

CpuSupportsRDTSC=0 or 1
CpuSupportsCMOV=0 or 1
CpuSupportsPrefetch=0 or 1
CpuSupportsSSE=0 or 1
CpuSupportsSSE2=0 or 1
CpuSupports3DNow=0 or 1
CpuSupportsAVX=0 or 1
CpuSupportsFMA3=0 or 1
CpuSupportsFMA4=0 or 1
CpuSupportsAVX2=0 or 1
CpuSupportsAVX512F=0 or 1

"

try running prime with same avx ffmpeg is using (Guessing og)

source: https://www.tomshardware.com/reviews/stress-test-cpu-pc-guide,5461-2.html

1 Like

It is running the same AVX FFmpeg is using, but using the Windows Scheduler rather than the Linux scheduler. 4.17.3 was the Kernel when it crashed.

Linux install also has updated microcode. Windows install doesn’t have new microcode.

Also forgot to mention the Prime version: 28.10.

I also have the ability to run Prime in OSX if it matters. but the OSX install is also before the new microcode got pushed.

Well…

stress -c 12 -i 6 -m 1 --vm-bytes 24GB -t 15m

PASSED.

So then why is it SPECIFICALLY FFmpeg x264?!?

So running x264 again, I checked niceness with htop… 0 and 10…

Even did i7z and it’s not boosting VID that crazy neither… watch -n 1 sensors also didn’t show abnormalities.

Finally caught the error. Machine Check Exception Kernel Panic, context corrupt:

I know the new patched microcodes do use the PCID feature of the CPU a lot more, but what part of the CPU does that affect voltage wise?

Edit: it’s always the same context corrupt error.

https://forums.intel.com/s/question/0D50P0000490ApiSAE/linux-machine-check-exception-is-it-the-cpu?language=en_US

The general idea is memory, but the general consensus is to increase VCCSA. That however got it to crash faster.

Consistent at crashing, but inconsistent on what voltage increases stability.

Random possibly unhelpful musings:

x264 is quite easy on memory when at high presets/settings but very very very memory intensive with the faster ones.

you are asking for a near 20% overclock, as with all OCing, out of spec causes issues.

Yeah, I get it, it’s memory, and I’m using slow preset so it isn’t as fast as it actually seems. Yet it’s the only thing that crashes. Anything faster than medium puts insufficient stress on the CPU.

PLL increase didn’t help neither. Again, crashing faster than before.

This is uncharted territory as people that diagnose these Kernel Panics all do it at stock speeds. Nobody has done it overclocked. Blame the overclock isn’t as helpful as it makes sure the issue will never be solved. It’s about WHY.

The reason for overclocking instability is usually defined as the fact you have overclocked. That is your ‘why’.

The issue is that there are so many variables at play here that unless you wish to spend the next days/weeks/months/years trying to find the answer which you may never find, most will just stick to your 4.3 stable and be happy.

Noting that when diagnosing overlocking you are working within both the software AND hardware domains in multiple dimensions.

This is a terrible answer. You’re basically saying “Just give up, it’s not going to get any better.”

This is the “OVERCLOCKING” section, you’re basically coming in and saying “You guys should give up before you even try, you’re all stupid for voiding your own warranties.”

1.4V Vcore with slight Vdroop was the answer. Maxing out current protections on the VTT and VCCSA while leaving the voltages at stock for those 2 specifically helped too.

If you get that Kernel Panic, that is equivalent to a 0x124 BSOD on Windows on Ivy Bridge-E.

I literally don’t care anymore if that’s a bit excessive, the droop will take care of the rest and my temps are fully in check.

Bump your memory voltage if there is room ?

Already at 1.55V memory. The BCLK change mandated that.

No, the answer to all but vanity overclocking is to get something safe and stable.

You’re still pushing for the stupidity factor of people and I don’t appreciate that. If you’re the camp of “Make one mistake = you’re an idiot.” You’re part of the toxicity I want to avoid.

I’ve had enough of people telling me “You can’t.” It’s insidious.

Let people learn from mistakes on their own rather than berate them.

I wonder if this is just a loading issue, like the amount of load. I know it will likely be pegged at 100% on both OS’s but is 100% equal across them does windows through going easier, shuffling work, windows “optimisations” AMD so on mean that the AVX load is not as high as on Linux.

Or to put it another way is Linux too good at what it does and pushes too hard where windows does not?

well you’re absolutely right, you CAN ignore any suggestions you feel are insulting. However, at the end of the day, WE don’t have to use your machine. You do.

1 Like

THANK YOU. This was the point I was trying to make to abaxas.