Don’t want to hijack another thread so I’m creating this one.
It seems SlyEcho’s fork of llama.cpp is about to get merged into the main project. It has been approved by Ggerganov and others has been merged a minute ago!
I’ve been using his fork for a while along with some forks of koboldcpp that make use it it. Although it is stated that it is still flawed but even then better than what we had with OpenCL.
Llama.cpp has OpenCL support for matrix operations that work on AMD cards but it’s not as fast as CUDA. With this change AMD cards should be able to achieve competitive performance. It might be not bumping shoulders with Nvidia for now but hey a 7900 costs as half as a 4090 right?
You’ll probably need to set the CC and CXX variables to the LLVM compilers provided in the ROCm runtime and run make with LLAMA_HIPBLAS=1.
Something like this:
export CC=/opt/rocm/llvm/bin/clang
export CXX=/opt/rocm/llvm/bin/clang++
git clone https://github.com/ggerganov/llama.cpp
make LLAMA_HIPBLAS=1 -j
Merging to the main project will make it easier to use ROCm on other derivative projects like ooba’s webui or langchain which rely on the python lib. Also great timing since Zuck just released his code focused llama-2 model.
God damn I’m jelly. I make do with a 6800m with 12GB vram sometimes offloading to normal RAM which I have plently. Never attempted to train anything.
In any case things are improving very fast. We can now use CUDA-like features and speed on LLM and just today I’ve tried stable diffusion with fp16 support.
I can’t say I used that much ram to create the Shakespeare model, it’ll take more than 10min without hardware matrix acceleration but give it a go and see how it fares?