Hey, I saw yall benchmark BERT and A1111 on an AMD MI210 in a video.
Thing is… those benchmarks are super unoptimized. At the end of the video, ya’ll said to post better benchmarks in the forums, so here I am!
Facebook has a hyper optimized version of Stable Diffusion in a framework that explicitly supports AMD and Nvidia server GPUs. Its 2x-3x as fast as A1111, maybe more on big GPUs like the MI210 and A100.
Go to this github repo to try the demo:
/facebookincubator/AITemplate/tree/main/examples/05_stable_diffusion
If a raw python demo isnt your cup of tea, a Stable Diffusion UI has AITemplate implemented here: VoltaML/voltaML-fast-stable-diffusion
Meanwhile, BERT is a old and janky language model. The hot new thing in AI land is the ChatGPT-like Llama model, coincidentally also from Facebook.
There a few “optimized” backends, but the easy one I would suggest is KoboldCPP (or llama.cpp) compiled for OpenCL support:
LostRuins/koboldcpp
Its a nice contrast to AITemplate stable diffusion, as its designed to run on tiny gaming GPUs (offloading most of the model to CPU, if needed). ROCM and Vulkan backends are a work in progress.
I would suggest running the brand new Llama 2 models, which you might see in the news later today. Search for “TheBloke Llama-2 ggml” on the Huggingface website.
…Sorry, I just registered and cant post the Github/Huggingface links.
Do I need to ping the YT video makers?