Heterogeneous core architectures and machine learning

The recently introduced Intel Alder Lake processors (and less relevantly to this application, Apple Silicon) take a new approach to processor design by providing two types of cores on the same CPU die. On one side, full sized high-performance cores, optimized for single core performance, high clock rates, and full support for SMT. On the other side, smaller, more energy-efficient cores, that make some performance compromises yet aim to be capable for more parallel tasks.

For the machine learning community, these (Intel) processors are really only relevant for prototyping workstations equipped with one or two GPUs; they don’t scale to large production servers. For someone intending to build a machine like this, the Intel 12900K and AMD 5950X may be the most interesting contenders for processor choice, as they deliver high performance without falling into the much more expensive workstation-class Xeon and Threadripper processors. Intel has a slight edge on single-core performance on most synthetic benchmarks, but things are unclear when it comes to multi-core performance: no clear winner emerges.

This is somewhat digressing, but there is also a discussion to be had about new forward-looking features on the Intel platform (PCIE 5.0 and DDR5 support) — a set of upsides that also come with the early adopter instability and price inflation — versus the stable, affordable, yet not-forward-compatible status of AMD’s AM4 platform.

As far as I am aware, not much discussion has taken place over how an heterogeneous architecture might translate to training and inference workloads. A lot of the press covering these new processors focus on gaming.

I have to admit I am myself not very well versed in the ways CPU-side parallelism is leveraged in these tasks, so the impact this might have on performance is not obvious to me. Correct scheduling (assigning performance cores to the threads that actually need them) likely is a part of this — meaning operating systems matter here. I vaguely know certain models and architectures (RNNs, RL) and data-related tasks (augmentation and pre-loading) are more strongly tied to CPU performance.

I would be curious to have peoples’ perspectives and insights on this.

I happen to be a machine learning researcher (mostly working with GANs, CLIP-based models, and RL), and just managed to get my hands on a RTX A5000, with plans to possibly get a second one later on. I have somehow never owned a desktop, so I need to build something from scratch now. I’d love to hear people’s take on the 5950X vs 12900K.

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.