Supercomputer trains chatgpt sized model

The Frontier supercomputer is equipped with 9,472 Epyc 7A53 CPUs and 37,888 Radeon Instinct GPUs.

However, the team only used 3,072 GPUs to train an LLM with one trillion parameters.

The paper also mentions a key challenge in training such a large LLM is the amount of memory required, which was 14 terabytes at minimum.

2 Likes

Some author copy-pasta too much

fixed, Another model trained and released by the supercomputer group was Bloom using the French supercomputer Jean Zay with a compute grant worth an estimated €3M from French research agencies CNRS and GENCI.