MI210 AI Benchmark Request

Hey, I saw yall benchmark BERT and A1111 on an AMD MI210 in a video.

Thing is… those benchmarks are super unoptimized. At the end of the video, ya’ll said to post better benchmarks in the forums, so here I am!


Facebook has a hyper optimized version of Stable Diffusion in a framework that explicitly supports AMD and Nvidia server GPUs. Its 2x-3x as fast as A1111, maybe more on big GPUs like the MI210 and A100.

Go to this github repo to try the demo:
/facebookincubator/AITemplate/tree/main/examples/05_stable_diffusion

If a raw python demo isnt your cup of tea, a Stable Diffusion UI has AITemplate implemented here: VoltaML/voltaML-fast-stable-diffusion


Meanwhile, BERT is a old and janky language model. The hot new thing in AI land is the ChatGPT-like Llama model, coincidentally also from Facebook.

There a few “optimized” backends, but the easy one I would suggest is KoboldCPP (or llama.cpp) compiled for OpenCL support:

LostRuins/koboldcpp

Its a nice contrast to AITemplate stable diffusion, as its designed to run on tiny gaming GPUs (offloading most of the model to CPU, if needed). ROCM and Vulkan backends are a work in progress.

I would suggest running the brand new Llama 2 models, which you might see in the news later today. Search for “TheBloke Llama-2 ggml” on the Huggingface website.


…Sorry, I just registered and cant post the Github/Huggingface links.

Do I need to ping the YT video makers?

K, is there a step by step to run this on these systems beyond just the GitHub links I’m about to go poke at?

Sorry for the late reply.

VoltaML has specific instructions to install AITemplate.

The raw Python version from Facebook works like Huggingface diffusers.

KoboldCPP is pretty turnkey. Compile the OpenCL version from source, and download a big ggml model, then give it a prompt!

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.