Best old and/or cheap graphic card for AI - llama Vicuna Alpaca

gandolfi · April 30, 2023, 8:56pm

I have a Ryzen 5 2400G, a B450M bazooka v2 motherboard and 16GB of ram. I would like to run AI systems like llama.cpp, vicuna, alpaca in 4 bits version on my computer.

I am looking for old graphics cards with a lot of memory (16GB minimum) and cheap type P40, M40, Radeon mi25.

Do you have any cards to advise me with my configuration?
Do you have an idea of the performance with the AI progremma I mentioned?
Thanks

abufrejoval · May 5, 2023, 5:53am

4bit quantisation support requires tensor hardware or CUDA compute 7.5 to run, RTX2000 or Turing as a minimum. Those went only up to 11GB on consumer hardware and may still not fit your budget. Older GPUs will require 32-bit weights and then 8x the RAM, even Voltas are of little use.

While Llama & friends don’t need a lot of CPU power when they do most of their work on the GPU, 16GB of RAM and that CPU of yours may be cheaper to upgrade and help you run the smallest models on the CPU itself, but also require 32bit weights there or 8x the RAM 4bit weights consume.

I see a bigger chance for 4 or even 3 bit quantisations on CPU variants of “tiny” LLMs, because essentially they suck so much the masking and shifting for the low-precision weights won’t be much of an overhead. It’s the reason we’re starting to see even mobile implementations.

That type of logic is general purpose poison on GPU cores, so older ones won’t see this type of retrofit code.

quilt · May 5, 2023, 9:20am

What are you looking to do exactly?

For inference lama.cpp allows you to use the CPU with good performance. If you want to fine-tune that is a different story, but I’m not sure your base system will be up to scratch either.

X8X_Foundries · May 5, 2023, 10:11am

AMD MI25, never look back

gandolfi · May 9, 2023, 10:13pm

Thnaks for your answer.

i want to do interference with 13b and 30b model and maybe fine tune.
I want to be able to use an llm smoothly enough to then code or use plugins locally (browsing internet, analyse my document, create picture…).
i have bought Tesla P40 on ebay. i’m waiting the delivrary.

what is blocking or limiting my computer?

what do you mean ? what are the advantages compared to a P40 ?

X8X_Foundries · May 10, 2023, 1:15am

Mostly just a lot of compute from Vega. There’s dual package cards too. Stable Diffusion is currently hoarding mi25’s

system · February 7, 2024, 7:15pm

This topic was automatically closed 273 days after the last reply. New replies are no longer allowed.