Optimal SIMD/Vector array width

olddellian · February 10, 2019, 6:04am

Current State of Vector/SIMD

ISA	Architecture	SIMD width	SIMD ISA
Intel 64	Skylake-X	512 bit	AVX-512
AMD64	Zen	128 bit	AVX
AMD64	Zen 2	256 bit	?
Power	POWER9	128 bit	VSX
ARM	Cortex-A8	128 bit	NEON

According to @Methylzero, AVX on Zen is internally 128 bit wide

Question

What is the theoretically optimal SIMD width for CPUs?
I would guess this varies based on use case, but I’d be curious what people’s thoughts on this are.

Considerations

Use case (duh, not everything on CPU is SIMD heavy)
At some point, moving the data to a GPGPU or ASIC probably becomes more efficent
large TDP difference from SISD operation, as on Intel’s AVX-512
CISC/RISC ISA making a difference?

Why are you asking about this?

Partly based on a February Lounge post by @wendell, in response to me asking about the Talos II board he is testing; maybe the SIMD width of 128 bits on POWER9 is a limitation? But Zen’s SIMD is also 128 bits wide too…

and also information I’ve been thinking about from these two threads:

in the FMA thread, I asked this question specifically about Power but knowledge on that is a bit more scarce, so I’m curious what information/discussion a more general question will turn up.

olddellian · August 8, 2019, 5:51am

New datapoint; there is/is-planned a special supercomputer-targeted ARM chip from Fujitsu which uses 512-bit SIMD.

the A64fx processor developed by Fujitsu for the Japanese path to exascale computing

https://www.nsf.gov/awardsearch/showAward?AWD_ID=1927880

Looks like A64fx was announced at last year Hot Chips, but I only heard about it now; thanks to Ogawa, Tadashi on Twitter.

So this uses ARM’s new Scalable Vector Extension (SVE), but the spec itself doesn’t specify a width, and SVE isn’t meant to replace NEON.

SVE is a complementary extension that does not replace NEON, and was developed specifically for vectorization of HPC scientific workloads.

Rather than specifying a specific vector length, SVE allows CPU designers to choose the most appropriate vector length for their application and market, from 128 bits up to 2048 bits per vector register.

So, Intel isn’t the only one going for 512-bit SIMD, but in both cases (AVX-512 and now SVE) it has started out as a supercomputer technology.

olddellian · August 14, 2019, 12:19pm

From @methylzero, in another thread:

So it sounds like AMD is putting their money on 256-bit as the optimal CPU width. Maybe 512-bit really is only worthwhile for supercomputer use… and for those uses AMD will probably rely on Radeon Instinct silicon connected via Infinity Fabric, much like how IBM offloads SIMD to GPUs via NVLink/OpenCAPI.

So in terms of supercomputing tech, the focus looks like it’s:

	internal SIMD	external SIMD
ARM	SVE (Ex: A64fx)
AMD		Infinity Fabric
IBM		NVLink/OpenCAPI
Intel	AVX-512

Which really makes it weird that Intel is choosing now to go into full-on external GPUs, but they make chips for much more than just supercomputers, so I probably shouldn’t read too much into it.

I guess it’s also worth thinking about how Gen-Z and CCIX might fit into the picture; Intel will probably make its own protocol if it decides to focus on external SIMD, but ARM (or maybe even RISC-V eventually) might choose to use Gen-Z, CCIX, or maybe even OpenCAPI to connect to off-chip SIMD engines.

Methylzero · August 14, 2019, 2:05pm

To be clear I would treat AMD’s plan to shun AVX-512 as a rumor for now. I think I have read it somewhere as a statement from an AMD rep, but I cannot recall where.
It does make sense though, AVX-512 is not just a straightforward widening of AVX/AVX2 instructions, it adds a LOT of extra complexity. IIRC it has all sorts of masking options, loads of extra registers.
X86 instruction decoder frontends are already large, power hungry, slow and no doubt a nightmare to design.
I can fully understand why AMD decided to go NOPE when faced with this, and decided to spend all the transistors saved on the decoder, registers, datapaths and execution units on cramming in more cores instead.

Greggio · August 14, 2019, 2:16pm

Just wondering, where the AVX-512 will become usefull for you, that AVX or AVX2 cannot do?

I havn’t faced the need of AVX needs outside of some requirements for webcam software suite for windows. Tho i have been on linux for a while now.

Methylzero · August 14, 2019, 3:57pm

Whenever you need to do some heavy number crunching.
Video encode/decode, image/video editing, anything involving linear algebra.