What kind of implications does a Hybrid CPU design have for performance?

I’m of course talking about Intel’s setup, or big/little as it’s called. Specifically i’m curious about the distribution of tasks for the P cores and E cores. And this isn’t reserved to just Intel though, big/little general discussion as well.

If the Task Scheduler is smart enough to distribute workloads, what could that mean for performance? And in my mind - in a perfect scenario - I always picture half the E cores working alongside the P cores for main tasks, while the other half are handling things like background processes and Discord, antivirus, etc.

But somehow i’m getting the impression that it hasn’t reached that level of intelligence yet? I reckon it’s as simple as; P Cores = All main tasks, E Cores = All secondary/background tasks. Considering Alder Lake’s E Cores are what? 30, 40% faster than Skylake cores, they would provide a lot of relief if they worked in tandem (some of them, anyway) with the P cores, no?

1 Like

As it stands, Intel has a long way to go of getting this fully actualized in mainstream Desktop Operating Systems. The implementation is better on the MS Windows side compared to the Unix-like OSes (no fault of the OSes). Unlike ARM’s BIG.little implementation, Intel’s implementation has a lot of hardware intelligence to help with some things but still needs OS integration to do a lot of the heavy lifting.

I have been away from the CPU architecture world for a while, but one of the features of ARM’s implementation was that for long running, low usage tasks, they would automagically get downgraded to the weaker cores after X time. As the Android Kernel has been marginally updated over time, there is some process execution context information that can automatically assign processes to the BIG or little cores at will.

2 Likes

That’s the problem I see. The OS is not fully aware. The splitting of P and E cores is a kludge on their awful power usage. All P cores is rather scary looking at full chat, and the constant beating over the head by AMD about how much more efficient while being as performing was not a good look. So they put e cores in there so that the P cores are hopefully not running all the time.

A proper implementation would be nearly 100% hardware controlled or nearly 100% software controlled. As it is right now neither the CPU nor the OS can manage itself correctly and both seem to be going through constant improvements.

It could be very good but right now, I don’t think it is actually useful in a realistic sense.

Beyond the release, I never saw any info about them or actually how well it works, from my admittedly very light look at it, it seem in real world use it is everything to the P cores till they are busy and then failover to the E cores as a back up, effectively solving nothing and only increasing then power usage as you now have all the P cores and now also E cores running too.

1 Like

Yeah, that’s how I figured it would work. Some of the thread workload alongside the P cores I reckon would make for excellent improvements in performance. But I wonder if it would only make things more complex in terms of instruction/execution. I don’t know the damn terminology and don’t pretend to, either. :laughing: Let’s say; 8+8 Design, P/E cores. 8x P Cores all being utilized. 4x E Cores there to supplement the workload and add those additional threads where necessary. The remaining 4 can simply chill in the background for the mundane things. Or possibly this can have a wide amount of granularity in how the workloads and cores are distributed. 8P+6E when needed, while the remaining 2x E Cores are for the remainder of compute power.

Considering Alder Lake’s E Cores are around 40% faster than Skylake… they’re essentially just old desktop cores with a mobile power draw/usage. Even a count of 2 E cores is more than sufficient for background processes.

I don’t know… Zen 5 may be crazy. What i’m really interested in are the Nova Lake processors at the 18 Angstrom node. I have a gut feeling that will slay.

And on a random note, why the hell aren’t we seeing 8 core desktop CPUs at a base frequency of 4.0GHz?! Come on, already!

1 Like

Power draw and power delivery.

This would be akin to how the Cell BE partially functions in layperson’s terms (more complex in actuality). You would have a Central Processing unit that functions as a PPE. The difference is that there are multiple types of SPEs/SPUs, P cores and E cores that work on things.

With Cell BE, the work was deliberate of whether it needed to live on the PPE or if the PPE needed to schedule the work to run on the SPUs. The SPUs function as DSPs that work on highly parallelized data. They cannot work (efficiently) on general, non-prepared data.

2 Likes

:frowning: The CELL was impressive. But it was also so, so bad at the same time. Had Sony not adopted such an alien microarchitecture, they could have created an emulator for PS1 and PS2 much more easily. Not to mention brought that emulator into the future with new consoles. IE: the similarities in architecture between the PS4/5. An emulator would then become an IP of its own. And a consistent featureset of a video game console.

1 Like

Too be fair, multi-core systems only existed in the server space at the start of development and those were all RISC based systems (SPARC, POWER, MIPS). The biggest hit against the Cell BE and MS’s XENON PPC architectures were that they were single core (but 64bit), In Order Execution units. The had huge pipelines but as a result of having to do some basic general purpose processing, they would STALL ALL OF THE TIME. Luckily for the xBox 360, Direct X acted as an abstraction layer but allowed you to run less optimized code as it would do that translation at compile time. For Sony, you were on your own and even worse, since the SPEs/SPUs did all of the heavy lifting, you had to manage your own memory and pipeline while also writing highly parallelized code, something that no one was doing in the video game world and barely in the PC software world. Cell BE introduced a lot of Ph.Ds into the video game architecture world and gave rise to the Middle Ware companies.

With the Zen CPU design, we are starting to get to where the Cell BE was headed. Big difference is that the additional cores cannot function like true DSPs. 8-core 16-thread systems can now emulate PS3 games almost perfectly but still suffer with particle effects that were relegated to usually one SPE with direct access to the framebuffer.

I would love to see what a modern POWER system with at least 4-cores 16-threads (SMT4) could do to emulate PS3.

3 Likes

The issue is the Windows scheduler. It’s brain dead retarded.

MacOS with the hybrid architecture intelligently schedules between P and E cores based on task priority and other factors.

2 Likes

Keep in mind that big.LITTLE architecture is where the industry is headed regardless. The current (substantial) rumor is that Zen 5 is going to have P & E cores with the little cores being Zen 4 based. These kinds of things are set in stone at least 2-3 years from release, which is why Intel did so poorly with their 11th gen, but still released it anyways. Whether Microsoft can properly utilize the setup is a different story, but the industry seems to be betting on them either way.

1 Like

I think this is a big reason for their push for Azure and I would not be surprised if they run a GNU/Linux VDI that points to an Azure instance in the cloud to help overcome this problem so that they can allow people to adopt a more iterative version of MS Windows. MS Windows Subscriptions incoming in 15 years?

1 Like

Hey, it is getting better, but still has a long way to go. Like ReFS. They should have just stuck with WinFS. It was better in every way.

1 Like

Now you’re just proving how big of scumbags Sony are. :laughing: Someone pointed this out on Reddit I think. Emulation at this point is now a matter of willpower. But of course, why do that when folks can repurchase their games from 20 years ago? So, so shitty.

3 Likes

What does the base frequency have to do with anything? Some AMDs have been able to all-core (>8) turbo beyond 4.0GHz for a long time.

1 Like

Big-Little started with ARM processors, in some flagship android smartphones. Main leveraging point, was choosing between power efficiency or full send.

Windows Scheduler, is insulting the R-Word HAHHHD-
… They [still] haven’t been able to properly handle AMDs TR series

2 Likes

Because it works for Nintendo and they have been doing that since the 90s. I don’t agree with it but MS has a lot more disposable income to spend on working BC from the first xBox into the current console.

Sony instead tends to take the value added/remaster approach when it is lucrative for them to do so. This is why I have two BC PS3s. There was a time when Sony used to support this, but now it can be done in software. We all know that Sony is good at hardware but has always been subpar on the software front.

2 Likes

I think the Wintel alliance is back since Apple has left Intel high and dry when Intel needed them most. Remember, MS has been trying to push ARM on MS Windows but have been incredibly restrictive on what the RT devices can do to not totally alienate Intel. Now that Apple is having success, I can very well see MS going full send on ARM and dropping x86 for their own custom silicon.

1 Like

The whole idea of ‘efficiency’ cores seems like a huge waste to me.
Normal cores don’t take much power at all when they are idling, and if you need to do something, why not use the most powerful cores you can?

The only purpose i can see for “E” cores is on a device that has limited battery power to draw from.

But on a desktop? Why would you risk crippling yourself by getting anything like that?

Just give me all full power cores and call it good. imo.

3 Likes

The tables have turned and now Intel is having issues with power budget and AMD is the power efficiency king of x86.

It makes sense on mobile and for real-time systems that have critical processes that need to always run. It would be more power budget conscious to run those processes on a low power core if they are not resource intensive.

The logic with big little was that for bursty processes, running on the big core could be detrimental considering the spin up time, but longer running processes may benefit from running full tilt on the big core as the total execution time would be less on the big core. Then the logic got flipped that you can use a big core to split up the initial work and hand the long running things that are highly parallel to the little cores.

Now there is a whole field of study of efficiency cores versus raw power cores and how you pass a task back and forth to get the most efficiency and how to encode that information in the instructions so that you need less management from external systems.

1 Like

E and P cores in real-time systems sound like a CSB investigation waiting to happen.
Imagine having a process on an E-core check temperature every so often. When that process then needs a lot of compute, it may get re-scheduled to the P-core and poof goes the reaction vessel.

3 Likes