Zen 5 waiting lounge - I plan to upgrade to Ryzen 9000 (zen5).. join & chat if you too

jxdking · October 12, 2024, 1:47pm

Yet, you still need xbox game bar to park 2nd ccd to play the game.
I don’t think it makes sense.

For 16 core x3D, I think it is still on one of the CCD. Another option is to use Zen 5C die, which is 16-Core CCD on 3nm TSMC node. Thus, only a single CCD is needed.

Vishal_Rao · October 12, 2024, 2:10pm

Zen 5 Threadrippers anyone? Me want!

Exard3k · October 12, 2024, 3:00pm

have fun with your 3.7GHz gaming CPU

ubergarm · October 12, 2024, 3:34pm

The recent AGESA PI 1.2.0.2 update improved core-to-core-latency across chiplets a lot on my 9950X e.g. ~190ns down to ~80ns. So that may help things a bit for some workloads.

And good point, regardless of one or two v-cache, you can pin a process to a single CCD with something like:

# psure first 8 threads are on a single chiplet, 8-15 are on the second?
$ taskset --all-tasks --cpu-list 0-7 /usr/bin/steam

Not sure if you start /usr/bin/steam like that if processes it spawns will inherit settings? Or do you have to get creative or pin the PIDs afterwards or use a script? (launch options seems to be additional arguments…)

Anyway, regardless of the 9950X3D v-cache design, definitely some Linux tools to wrangle processes onto the best cores…

Pixo · October 12, 2024, 4:28pm

Last time I checked taskset was inherited.
You could set it for steam, start a game and check with the tool if affinities stay.

ubergarm · October 12, 2024, 4:54pm

Nice, yeah just did a quick test and updated my post to include --all-tasks to ensure spawned threads inherit affinity.

Seems to be working when I force valheim onto threads 0-3 in this screenshot (game is not visible, but running on workspace 3).

igormp · October 12, 2024, 5:06pm

I don’t think this will be needed. It was not needed on the dual CCD 5000 and 7000 CPUs, and only on the 9000 due to that weird latency that they fixed.
The scheduler should be aware of pinning processes to a the higher clocking CCD in case both CCDs are equal (the previous issue was that the higher clocking one was the one without the V-cache), so I don’t think anything extra will be required.

I do find dual v-cache kinda useless tho, I don’t see games scaling to the extra cores, which makes the second v-cache ccd kinda moot for non-CFD scenarios.

xwraith · October 12, 2024, 5:51pm

Yes, but I imagine not until Q2 2025 at the earliest.

tyreal · October 12, 2024, 8:09pm

milan 7773X has better latency than 7763, reflecting that the larger L3 cache helps reducing the inter ccd latency. It should do the same on ryzen with larger cache on both ccds.

MilzyBee · October 13, 2024, 10:38am

Bill_Killernic · December 7, 2024, 12:07am

Sorry for digging this up but since it is not that old and since I don’t think my question warants a dedicated topic here I am ^^

So I heard that the new x870 mobos may be optimized for CUDIMMs

At least that is what MSI says about theirs.

A quick googling didn’t yield any resurts in terms of testing the stability and capability of high frequencies using CUDIMMs (aka dimms with a frequency controller chip -Client Clock Driver (CKD) - on the ram module it self)

Does anybody here have testing any such thing or know of a review/test or youtube video that did?

Also is there any difference and/or compatibility issue between the various different dimm names I found for modules that have CKD? like UDIMM vs RDIMM and CUDIMM ? Are they completely the same?

If not what’s the difference and are they all compatible with ryzen 9000 series?

Well I believe it depends on the scheduler and on if the game needs more than 16 threads and how you park the cores (in case the scheduler doesnt do a good job) given though the potentially better binned dies at least for the main ccd and higher clocks as well as higher l3 cache (even if only 1 ccd will have 3D cache the 16 core model should have more than the 8 core at least that was the case with the 7000 series)

Unlike the 7000 series (which most test though were done with older schedulers now I think things have gotten better both on linux especially with the latest kernel and on windows + 9000 have better branch prediction which I think will help in dedicating resources from one CCD to a specific repetitive task such as a PC game is)

twin_savage · December 7, 2024, 12:42am

There isn’t a very big push for CUDIMM compatibility with 9000 series because their memory controller is too weak to take advantage of the benefits CUDIMMs can offer.

For a frame of reference, Intel’s current CPUs can hit over 12.5GHz effective memory speeds, while Ryzen 9000 can only hit about 7.5GHz effective speeds right now<- this example is extreme overclocking but is indicative of the platform’s potentials. This speed disparity is due to how AMD implemented memory access in their chiplet based processors and is very unlikely to change until they go to a tile (by tile I just mean non-organic interposer) or monolithic architecture.

An interesting fact is that AMD does actually make some non-chiplet based processors that are monolithic and they are hitting over over 10.5GHz effective memory speeds (8700G) (again, extreme overclocking example).

RDIMMs buffer the data lines and the clock lines, UDIMMs buffer nothing and LRDIMMs buffer the data lines, the clock lines and the command lines. CUDIMMS are a subset of UDIMM as far as compatibility is concerned (voltage and notching) and only buffer the clock lines, while RDIMMs are not a subset and have no compatibility with consumer Ryzen CPUs.

Bill_Killernic · December 7, 2024, 2:17am

You mean MT ? also, by using U/CUDIMMs at 1:1 ratio?

twin_savage · December 7, 2024, 2:28am

Yes MT, I’m saying effective speed to try to reduce confusion, since the actual clock of the memory is half that speed.

Edit:

no, it seems like most all the OC examples are using 1:2 ratio.

Bill_Killernic · December 7, 2024, 4:03am

I find it hard to believe that this is case since CKD modules supposedly reduce the noise in signals and 870e mobos supposed to be optimized for cudimms unless instead of optimized mobos we have the exact opposite …

Because as is, with zen 5 (but even with zen 4 ) and older mobos 8000 MT/s @2:1 is quite possible maybe not 100% of the time but with a good mobo and a good kit it is highly likely you are going to be able to run this so I doubt that on an optimized mobo with CUDIMMS one would get even less than 8000 MT/s…

Unless you can give some references with tests showcasing this issue which would be much appreciated.

=========

EDIT: I googled a lil bit further and turns out that the CUDIMM compatibility MSI was talking about for x870e mobos was a big nothing burger

Since it “supports” it by disabling its main feature aka by choosing an option in bios called “bypass clock” which will bypass the CKD of the modules so it will be the same as you had regular DDR5 modules…

In other words it’s a mere skeleton compatibility and not actual support.

It is like saying you can choose any colour you want as long as it is black lol

source

lemma · December 7, 2024, 12:43pm

Yup. But…

…I’m skeptical here.

AM5 and LGA1851 both use organic package substrates, main difference being Ryzen’s DDR routing is direct where Arrow’s passes through the Foveros base.
Granite Ridge gear 1 is mostly more solid than Arrow gear 2, supporting DDR5-5600 1DPC 2R 2SPC and, at 8000+ versus 8400+, there isn’t that much different in gear 2 1R overclocks.
The Arrow gear 4 headlines use LN₂ and aren’t attached to perf or stability data. I’ve also failed to find perf data for Arrow’s gear 4 DDR5-6400 1R CUDIMM support.

Arrow gear 4 comes with a 27% latency penalty, suggesting that last bit is maybe because it’s slower at DDR5-6400 than gear 2 5600.

So, while AMD’s lack of gear 4’s a marketing disadvantage, I’m finding it hard to tell if there are functional limitations attached to it for day to day use. I do think you could end up being right in that if the Zen IO die gets reved to add gear 4 support that release might coincide with use of CoWoS.

twin_savage · December 7, 2024, 10:34pm

The CKDs only help with signal integrity on the clock lines which are a fraction of the problem; the data and command lines are more chaotic since they aren’t as periodic as the clock and have more harmonics that would benefit from redrivers.

Corsair has an okay-ish article on what CUDIMMs are doing and they include some eye diagrams:

I’d expect CUDIMMs to give more benefit to dual ranked DIMMs since it is harder to drive their clock than single ranked DIMMs, which make me wonder why I only see single ranked CUDIMMs on the market.

My memory OC data was a little stale in the last last post, right now hwbot has the the following records for each platform:

Arrow Lake: 12.666GHz effective
Raptor Lake Refresh: 11.648GHz effective
Phoenix: 11.862GHz effective
Raphael: 9.37GHz effective
Granite Ridge: 8.790GHz effective

I’m medium-sure that all but the Arrow Lake example are running on gear 2.

The reason I reference this over what manufactures claim their platforms can hit is because of differences in the amount each manufacture leave in “reserve performance”. The trend of chiplet based processors clocking 1-2GHz less than their monolithic/tile counterparts tracks to general user outcomes as well. All that being said, this is still a very imperfect proxy for memory performance.

This is true, but only the chiplet based processors are using the organic package between the memory controller and CPU. While its probably theoretically possible to make an IO die that can run fast memory while still being “far” away from the CPU, AMD’s design libraries don’t seem to support this.

I know this is making some generalizations, but Phoenix clocking ~2-3GHz higher than the chiplet based processors is telling me AMD’s libraries are fully capable of hitting good memory speeds, just not when forced into the chiplet architecture.
You could make the argument that this is 4nm vs 6nm memory controller, but Arrow Lake is using effectively the same process as Granite Ridge for their memory controller and they are also getting that 2-3GHz speed bump.

You make a good point about the different Gear ratios complicating comparisons; The only thing I can think of to account for that outside of making sure everyone uses gear 2 (because theres no way the crazy overclockers are going to revert to using gear 1) is looking at bandwidth numbers, although even this can disadvantage AMD due to their IF being a bottleneck.

lemma · December 8, 2024, 1:37am

A corollary of Raphael, Granite Ridge, and Arrow’s IO dies all being TSMC N6 is AMD and Intel are starting from the same design libraries. While I’m not seeing a way to test them without inside information us forum folk lack access to, the null hypotheses I’m working from are

Arrow’s edge of 8400ish rather than 8000ish in non-LN2 situations is mainly a gear 4 thing rather than a library or base tile thing. Because same process and my signal integrity experience suggests it’s unlikely Foveros carries much significance to socket nets.
Phoenix and Hawk highmarking at like 9000 or 9100 in non-extreme situations, versus 8800ish Granite Ridge highmarks, suggests monolithic N4 doesn’t confer much gear 2 advantage over chiplet N6.

Off package IO drivers are big, so have been decoupled from node sizes for some decades. Really the last major change I’d flag here is Intel pulling termination resistors on die with Katmai in 1999, which was a 250 nm process.

I think your point about bandwidth’s fair but Granite’s latency advantage and apparently greater bandwidth efficiency leaves me uncertain as to how to make a good comparison with Arrow on this basis. The bench data available suggests to me greater divergence between the two than with previous gens, which might reflect workload affinity to lower latency (Granite) or higher bandwidth (Arrow). But that’s very likely confounded by bus, cache, core, and scheduler differences.

Kind of my takeaway is Granite doesn’t match some of Arrow’s headline numbers but it also doesn’t seem like it needs to. So that’s probably ok? And AMD plausibly knew this and allocated engineering effort accordingly?

I’m interested in comparing our in house workloads’ behavior on 265K versus 9900X and 285K versus 9950X at their DDR5-5600 1DPC 2R 2SPC support limit with 2x48GB. But that’s not happening because we don’t have budget.

MisteryAngel · December 8, 2024, 12:18pm

EU prices of the 9800X3D needs to come down.
But i guess they wont for a while.

MilzyBee · December 8, 2024, 1:03pm

EU and US are kinda lucky, they have a short window, okay maybe super short before things get bought up by scalpers… am thinking 2-3 months after release for the 9950X3D to make it here in very very low stock numbers, and a lot of other things like the seasonic - noctua PSU collab as well as the Godlike mobo, they won’t be here until february earliest and once they’re out of stock, that’s pretty much it…

meanwhile, scalpers are already selling that mobo for $400+ USD above RRP I feel $3000 for a motherboard is a bit much…that’s more than a threadripper pro mobo

we need a Microcenter here so bad