Optimal SIMD/Vector array width

Current State of Vector/SIMD

ISA Architecture SIMD width SIMD ISA
Intel 64 Skylake-X 512 bit AVX-512
AMD64 Zen 128 bit AVX
AMD64 Zen 2 256 bit ?
Power POWER9 128 bit VSX
ARM Cortex-A8 128 bit NEON

According to @Methylzero, AVX on Zen is internally 128 bit wide

Question

What is the theoretically optimal SIMD width for CPUs?
I would guess this varies based on use case, but I’d be curious what people’s thoughts on this are.

Considerations

  • Use case (duh, not everything on CPU is SIMD heavy)
  • At some point, moving the data to a GPGPU or ASIC probably becomes more efficent
  • large TDP difference from SISD operation, as on Intel’s AVX-512
  • CISC/RISC ISA making a difference?

Why are you asking about this?

Partly based on a February Lounge post by @wendell, in response to me asking about the Talos II board he is testing; maybe the SIMD width of 128 bits on POWER9 is a limitation? But Zen’s SIMD is also 128 bits wide too…

and also information I’ve been thinking about from these two threads:


in the FMA thread, I asked this question specifically about Power but knowledge on that is a bit more scarce, so I’m curious what information/discussion a more general question will turn up.

2 Likes

New datapoint; there is/is-planned a special supercomputer-targeted ARM chip from Fujitsu which uses 512-bit SIMD.

the A64fx processor developed by Fujitsu for the Japanese path to exascale computing

https://www.nsf.gov/awardsearch/showAward?AWD_ID=1927880

Looks like A64fx was announced at last year Hot Chips, but I only heard about it now; thanks to Ogawa, Tadashi on Twitter.

So this uses ARM’s new Scalable Vector Extension (SVE), but the spec itself doesn’t specify a width, and SVE isn’t meant to replace NEON.

SVE is a complementary extension that does not replace NEON, and was developed specifically for vectorization of HPC scientific workloads.

Rather than specifying a specific vector length, SVE allows CPU designers to choose the most appropriate vector length for their application and market, from 128 bits up to 2048 bits per vector register.


So, Intel isn’t the only one going for 512-bit SIMD, but in both cases (AVX-512 and now SVE) it has started out as a supercomputer technology.

From @methylzero, in another thread:

So it sounds like AMD is putting their money on 256-bit as the optimal CPU width. Maybe 512-bit really is only worthwhile for supercomputer use… and for those uses AMD will probably rely on Radeon Instinct silicon connected via Infinity Fabric, much like how IBM offloads SIMD to GPUs via NVLink/OpenCAPI.

So in terms of supercomputing tech, the focus looks like it’s:

internal SIMD external SIMD
ARM SVE (Ex: A64fx)
AMD Infinity Fabric
IBM NVLink/OpenCAPI
Intel AVX-512

Which really makes it weird that Intel is choosing now to go into full-on external GPUs, but they make chips for much more than just supercomputers, so I probably shouldn’t read too much into it.

I guess it’s also worth thinking about how Gen-Z and CCIX might fit into the picture; Intel will probably make its own protocol if it decides to focus on external SIMD, but ARM (or maybe even RISC-V eventually) might choose to use Gen-Z, CCIX, or maybe even OpenCAPI to connect to off-chip SIMD engines.

To be clear I would treat AMD’s plan to shun AVX-512 as a rumor for now. I think I have read it somewhere as a statement from an AMD rep, but I cannot recall where.
It does make sense though, AVX-512 is not just a straightforward widening of AVX/AVX2 instructions, it adds a LOT of extra complexity. IIRC it has all sorts of masking options, loads of extra registers.
X86 instruction decoder frontends are already large, power hungry, slow and no doubt a nightmare to design.
I can fully understand why AMD decided to go NOPE when faced with this, and decided to spend all the transistors saved on the decoder, registers, datapaths and execution units on cramming in more cores instead.

2 Likes

Just wondering, where the AVX-512 will become usefull for you, that AVX or AVX2 cannot do?

I havn’t faced the need of AVX needs outside of some requirements for webcam software suite for windows. Tho i have been on linux for a while now.

1 Like

Whenever you need to do some heavy number crunching.
Video encode/decode, image/video editing, anything involving linear algebra.

1 Like