9950X, what the reviews don't tell you - because the reviewers don't know

Michael_Nager · August 24, 2024, 4:56pm

I did the experiment with CO, and it works better on my 9950X than it did on my 7950X.

If you want me to explain it to you and how to optimally implement it in conjunction with Curve Shaper, then join me in the Brotherhood of Level1 Discord in the voicechat and I will explain it to you.

I am NOT going to write out pages of text only to be misunderstood and being here all day, and night and then the next day, when you can ask all the questions you want in the voicechat and be done pretty quickly.

MaxHayden · August 25, 2024, 7:26pm

Do games even use the instruction that those core to core latency charts were testing (the “fast” compare and swap that is normally used for lock free data structures and other exotic things)?

It’s kind if misleading to say that it is as slow as going to memory because the point of the instruction is that it is atomic and either modifies the memory location completely or it doesn’t do anything. It seems that the regression was unintended, and that it only impacts the 9950x. But in an all core workload, you won’t notice it because the speculative hardware will hide that latency. I doubt most games are bottle necking in a way that makes that latency under a light load matter.

Similarly, I don’t know why people put so much stock in Cinebench. What is it actually benchmarking? What is the correlation between scores in that benchmark and an actual workload people care about?

It seems to me that two things happened with Zen 5. First, games are memory bandwidth intensive. And the best you could ever do would be to read every memory location exactly once and write it exactly once. A 7800x3d is probably pretty close to that already. So Zen 5 has less potential headroom.

It doesn’t get nothing because it has more cache bandwidth, better branch predictors, and a deeper instruction window, all of which let it send out those memory requests sooner. And it has 3 full multiplier units to do address calculations fast enough to take advantage of the larger instruction window.

But even with more sophisticated prefetchers and the rest, ultimately, you are limited by memory bandwidth more than anything the CPU is doing in most cases. Maybe with the x3d chips, capturing more of the working set will be enough to get a bigger than proportional boost vs the 7800x3d. But even then, there are limits.

And there are a few unexpected limitations too. The latency on single cycle vector instructions is now 2 cycles because of an unexpected pipeline hazard. This shouldn’t matter if you are saturating the vector unit and don’t just have long chains of dependent instructions. But maybe games are sensitive to this. I don’t know how they use vector code.

Similarly, it seems impossible to maintain more than 5 integer instructions per cycle in practice despite there being 6 units, perhaps due to porting / forwarding limitations.

And the uOp cache is slightly smaller and slightly slower to fill. So that too could be hampering the uplift.

All of this could be stuff they found out late and that pulled the performance down from what they expected when they finished the high level design ~18 months ago.

But, more fundamentally, if you aren’t using AVX-512, you are leaving 50% of the performance on the table, both computationally and in terms of utilizing cache bandwidth. Zen 5’s vector performance is “use it or lose it”. And no game that I’m aware of uses it because there’s never been a good hardware implementation simultaneously from both Intel and AMD. Hell, most games don’t use avx-256.

If you are playing Minecraft and using a JVM with support for AVX-512, you probably see an uplift. But I doubt that was a serious bottleneck for that game.

Corner cases like that aside, to really get the benefits of the chip, it needs developer support in the various libraries and tools that the applications consumers are using rely on.

So, in a sense, Zen 5’s performance is aspirational. If / when software is written / recompiled to support it, there is substantial room for uplift. You can see that now on Linux with the difference between using Clear and Fedora.

But AMD is not well regarded when it comes to software support, and even if they change that, by the time most consumer applications have support to take advantage of it, Zen 6 will be out and well be having a very different discussion. (I.e. If they get AVX-512 added to Unreal today, you’ll see the benefit in 2-3 years when games using the newer version of Unreal start hitting the market.)

If people are using something that benefits from Zen 5 like statistics work, the kinds of stuff Spec WS tests, or even just lots of code compilation, then it performs more than Zen 4 by enough to justify the cost. And given the bandwidth and power needs of the vector units, the HEDT threadripper might even be a more compelling value than it was last gen.

But if you are just running normal windows apps and gaming, I don’t see a benefit and am legitimately confused as to why AMD wanted to market huge gains. Yes, they made up their performance gap vs Intel on most older titles. And they are often matching 14th Gen despite having less in the way of speculative resources and clock speed.

But there’s nothing in the hardware that fundamentally changes what you need for gaming : a very large cache.

And they should have realized this well before the various issues that new architectures always have started being discovered.

MaxHayden · August 25, 2024, 7:36pm

My working hypothesis is that all of the massively increased speculative structures are heavily banked. So they require less power when going all out and can be powered down with finer granularity.

Wide vector units are also a lot easier to optimize for power. So making the design “use it or lose it” could actually be more power efficient than the old double pump design. I think tests bear this out: usually the chip hits maximum current (160amps) well before maximum temps when you just blast the AVX-512 instructions.

But this is just a slightly educated guess from a former hardware guy who hasn’t made anything in over a decade.

E-Wasted · August 25, 2024, 9:10pm

That’s kinda the whole issue, no one here is really questioning OP’s intelligence, but their inability to engage in discourse. They just come off as wanting to lecture us all based on their experience. I pretty much proved that by calling them out on their “incorrect”* use of Kryonaut Extreme TiM.

*READ: Yes, there is very little difference in performance as long as there is complete IHS/die coverage, but nitpicking seems to be the whole point of this thread, which is my point here.

KryoX, as explained by Roman (the guy who created it), if applied “correctly”, performs as good as it ever will from the second you apply mounting pressure. What OP calls, “taking a few days to reach its maximum efficacy” (what I called “curing”), is simply the pump-out effect making the TiM layer thinner so there is better IHS to cold plate contact. Basically they are using “too much” (yes, yes, not really a “thing”), which is arguably better than not enough, but again; nitpicking theme.

gluon-free · August 26, 2024, 2:18pm

I read this article by Y-Cruncher dev. I didn’t get why you talking about AVX-512 instructions and additional 2 ALU’s in the context of gaming and core-to-core latencies. AVX-512 is for floating point math (for example FMA of 3 dimensional vector’s instead of scalar per cycle per thread), ALU doing integer math. Is it heavily utilized in games? I doubt. In games, the cores prepare frames for the GPU and execute scripts, all of which must be in sync. For example, you should see the correct image of an enemy doing something and some kind of sound in the same unit of time. While the cores are preparing this, they are moving data from one to another and pinging each other to do the work synchronously, and this is where you suffer the latency penalty. Better branch prediction and a larger L1 cache are things that should benefit gaming.
And the problem is in high expectations and missed opportunities: i want to see actually high tech product in 2024 for 650+ bucks with real huge performance bump. Why AMD don’t use silicon interposer like in MI300A? CoWoS-S packaging costs around $100 per chip, add that to the price and release a product without the Infinity Fabric memory bandwidth bottleneck and cross-CCD latency. (This claim is for EPYC and Threadreaper in first ofc, but we should fight if we want this in consumer level products.)

Michael_Nager · August 26, 2024, 3:38pm

Don’t get me wrong, I am not dismissing the rest of your post, and it was a very interesting and informative read.

It is just that this portion is something that I could directly answer.

First, I have to repeat that I benchmark to configure, I don’t configure to benchmark.

I use CineBench R23 as my FIRST benchmarking tool, to conduct what we called, when I was studying Psychology at Bonn University in Germany, a “Superficial Plausibility” study.

So basically I will set the maximum safe voltage for the Ryzen CPU, which for the 5nm and 4nm TSMC nodes is 1.2 Volts per what TSMC itself has stated:

https://pr.tsmc.com/english/news/2895

Third paragraph:

These HPC features will enable N4X to offer a performance boost of up to 15% over N5, or up to 4% over the even faster N4P at 1.2 volt. N4X can achieve drive voltages beyond 1.2 volt and deliver additional performance. Customers can also draw on the common design rules of the N5 process to accelerate the development of their N4X products. TSMC expects N4X to enter risk production by the first half of 2023.

The 9000 series of Ryzen is based on the N4P node from TSMC.

After limiting the max voltage, I simply chickenclock the CPU until it crashes. I am of course keeping all other parameters constant.

I have found that this gets me about 95% percent towards my final configuration and is the quickest way that I have found of getting me there.

And yes, I am well aware of the old adage, “The first 90% of the project takes the first 90% of the time, it is only the last 10% of the project that takes the remaining 90% of the time”

My goal is not to seek the absolute highest benchmark result, but rather to find a configuration which will maximise the performance in line with my goal, namely 24/7 stable operation.

It is after this that I run other benchmark software (pretty much everything in the BenchMate suite, memory testing, the games I like etc) to see if I can provoke a crash (this is the other 90% of the testing).

Of course, the ultimate benchmark is just using the bloody system.

I hope this clears up why I use CineBench R23 as my initial benchmark of choice.

Michael_Nager · August 26, 2024, 3:47pm

My motivation behind applying the TIM the way I do is to ensure that air from the microscopically uneven surfaces is expelled along with the pump effect.

This works for me and of course you can do as you please with your application of TIM.

You don’t have to care what I do, because I don’t care what you do.

Are we now, finally, copacetic, or do you want to continue flogging this dead horse?

E-Wasted · August 26, 2024, 7:02pm

Another (incorrect) myth, there are no air bubbles between two (mostly) flat surfaces squished together with pressure.

I’m sure your method works great for you and all, but can’t you just stop digging and admit that there is a more optimal way to apply TiM? I provided simple video evidence from an industry leader, of whom you also consume their products.

Michael_Nager · August 26, 2024, 7:24pm

Did I say that? No, nothing even remotely like that,

Look at EXACTLY what I said.

I am done with you.

Your bad faith statements are beneath contempt.

MaxHayden · August 26, 2024, 8:01pm

Games make heavy use of vector instructions, they are mostly only optimized for SSE instead of AVX though.

They also make heavy use of integer units and multipliers for calculating addresses and running NPC AI and other non-GPU stuff.

I really don’t think games are using the compare exchange instruction much. And there is no unexpected drop in performance vs the 9970 and 9960 that implies otherwise.

The problem is that they marketed it as a gaming product. It isn’t, and I don’t understand why they thought they could market it as one.

gluon-free · August 26, 2024, 10:52pm

I believe they did this to turn early adopters of Zen5 into beta testers.

Michael_Nager · August 27, 2024, 5:37am

It’s because all the testing you saw done was on WinTel 11 that didn’t even take advantage of Ryzen’s single branch prediction, never mind the double branch prediction in the 9000 series.

AMD had been working together with Micro$haft for a long time to get this implemented, yet for some reason - Arrow Lake - M$ hasn’t made this fix available in a patch on desktop.

MilzyBee · August 27, 2024, 12:22pm

we all have our demons, experiences… but its how whether we let them change us or shape the person we become that speaks volumes. I hope you don’t feel my comment was meant to be mean, it certainly wasn’t…and you’ve explained that you don’t suffer fools or people purposefuly trying to be mean, i can understand that…we just have different ways of going about things is all and that IS okay… but this is a good place… this isn’t your average forum or even average tech forum.

conflict just makes me ughfgh a little, and yes, some peeps on here at times might seem to be poking you a bit but i feel they’re mostly meaning no harm. There is a whole bunch of really intelligent people here, that’s why i joined, to learn, because i am numptyBee still and have so much to learn and then to pay forward to others and its nice to see that’s a motivation of yours too

one thing that i feel is quite common with people heavily focused in tech, is that for all their skills, and knowledge with pursuits they have, their interpersonal skills can be a touch…lacking, so they come off as quite abrasive, hostile when perhaps their intent isn’t to be…

i have my own reasons why i am similarly not so good with some aspects of discourse (chatting, not the forum heh!) and maybe i might be able to be a bit more social and spot social cues better than some, but for the price of the brilliance some of the L1 peeps have, its understandable that perhaps discourse isn’t their strong suit.

Not to make for a safe space or anything because that would be iffy but, when people get comfy and know they’re in good company, they relax more, but i remember when i joined here i felt beyond intimidated and made a total doof of myself pretty darn quickly so… lets extend some kindness maybe, and if someone is being less than well, respectful, then just disengage

not saying anything we all don’t already know, and yes, lecture-bee is boring but sometimes needed a tiny tiny bit!

Michael_Nager · August 27, 2024, 1:26pm

Cats, dogs and computers love me, humans not so much.

There again, three out of four ain’t bad.

But serieously I have made no secret of being diagnosed with Asperger’s when I was in my mid 20’s in the 1980’s.

E-Wasted deliberately misquoted me, and so I am going to use the word from my first language, German, to describe what I mean with regard to TIM, and the word is, “Einschlemmen”.

German is a very precise language, whereas English is a very sloppy language which lends itself to ambiguity, especially with regard to those who argue in bad faith.

sneer · August 27, 2024, 6:16pm

It’s still twice the price of the 7950X. You’re paying double for similar performance. Economist of the Year, right?

But the question was whether I’d recommend the 9950X over the 7950X to a friend for production workloads (presumably building a new machine). It costs £600 compared to £440 in the UK, which is still £160 extra for similar performance with unknown reliability and durability. The rest of your comments about Intel CPUs, etc., are off-topic (see my earlier comment about your style).

Unless you’re doing 10-minute Cinebench runs for work, it’s not a real-world workload.

Those are different processors.

Apples to oranges: you are comparing a tuned CPU to reviewers’ stock (or in a different configuration).

Talking about reviewers needing a proctologist and speculating about their masturbation habits is not factual. It’s just bad writing.

To summarize: Congrats on successfully tuning your CPU for Cinebench. Hopefully, every 9950X is binned like yours or better (sample size of one), so everybody can enjoy similar Cinebench scores.

Michael_Nager · August 27, 2024, 7:23pm

In the UK the 7950X costs £480 and the 9950X costs £609 so that would be just over 25% more in price.

I benchmark to configure, I don’t configure to benchmark.

So you are saying that the stable 24/7 performance I achieve with CineBench R23 bears no resemblence to the performance and stability I should expect with Blender??

You are Ryan Shrout

Really? What gave it away?

I would never have known if you hadn’t pointed that out Captain Obvious.

Which part of I am comparing my own CONFIGURED 7950X to my own CONFIGURED 9950X did you not understand?

Here’s a hint, try reading what I wrote and not reading into it, because when you do that, you should be talking to a mirror and not to me.

My Ryzen CPUs are not exceptionally binned, just exceptionally configured

If you were not so interested in humping my leg, then you might have considered asking me what I discovered whilst configuring my 9950X and major differences to my 7950X.

sneer · August 28, 2024, 5:01pm

You’re just trolling at this point. This is my last reply in this thread, as you’re clearly wasting my time.

The 7950X costs £442 on Amazon UK, not £480 as you claimed, making it a better deal than Zen 5%, especially since it’s well-tested. You also said you could sell your 7950X for £250-300, add £300, and buy the 9950X, which would make it twice as expensive for similar performance.

You mentioned your Zen 3 and Zen 4 CPUs as proof that Zen 5 will be trouble-free and stable with your tuning. This is nonsensical, as those are different CPUs, which I already pointed out. Then you say it’s obvious, which it is, but you’re the one who brought it up in the first place. You’re contradicting your own “proof.”

Regarding:

“Which part of I am comparing my own CONFIGURED 7950X to my own CONFIGURED 9950X did you not understand?”

There’s also this:

“My results are a damned sight better than anything that anyone in the Tech Media/YouTube can achieve on a 10-minute run without resorting to exotic cooling (like chilling or LN2).”

Your comparisons are all over the place. Make up your mind.

“My Ryzen CPUs are not exceptionally binned, just exceptionally configured ”

A sample of one is not significant. Some might work, some might not. Hence, “binning.”

Adios, and good luck to whoever has to deal with you next.

Michael_Nager · August 28, 2024, 6:11pm

OK, don’t let the door slam you … etc. etc.

Still here?

Is that a tray or a box version - the difference is that the tray version only has a one-year warranty and the box version has a three-year warranty.

I was comparing box to box version from the same vendor, namely OCUK.

Amazon has also been selling used equipment (such as motherboards) as new - see Tech Notice.

I chose to buy the 9950X, and I chose to pay the £600, I am not interested in buying another 7950X. If I were disappointed I could send it back to OCUK, no questions asked, but I am not, so I won’t.

I really fail to see what your point is, or if you even have one.

Actually in my case it would be two Zen2 CPUs, one Zen3 CPU, two Zen4 CPUs and now a Zen 5 CPU.

So yes, given my exprience with AMD CPUs I see nothing in what I have experienced with Zen 5 to doubt or change, or modify or even remotely question my statement as to my confidence that it will be a well-behaved component.

You of course are welcome to your own opinion, but as far as that goes, it is a case of mind over matter, I don’t mind, and you don’t matter.

I’m glad that you put the term proof in quotation marks, it shows that even you know that your statement is BS.

The rest isn’t really worth the waste of the ink in my keyboard.

Goblin · August 29, 2024, 8:14am

Bee is right in the fact that hostility isn’t solving anything, but there is also a difference between hostility and wrong information. People often search this site for technical help, so to not call out doubtful claims is actually harmful. Nobody will have perfect communication skills, so the best we can do is try to accurately source counter-claims and if those get met with a “nuh uh”, then it’s probably time to disengage.

The Tracking button at the bottom of the thread allows people to choose not to be informed about future posts, and may help curb the urge to reply when it won’t be helpful.

Michael_Nager · September 5, 2024, 12:30pm

I really hope that you are not accusing me of supplying wrong information.

This would imply that I am falsifying my results.