im looking to run Llama 3.3-70B AI, would the A5000 be best or would some other GPU be better?
What are your requirements? How big of LLMs are you looking to run?
Llama 3.3-70B*
70B is pretty big for one GPU. You may have to quantize it
Unless you give a budget I’ll say H100.
So, what’s your budget and how many PCIe slots do you have? How much power can your PSU deliver or are you willing to upgrade it?
Llama 70B needs 43GB of RAM when quantized to 4 bits, the full model needs 8x that… So you would look at at least 2x 3090/4090/A4000+ or a single A6000. Depending on slots available and budget choose from these.
1 Like
looks like i’ll switch to llama 3.3 8b model
You can buy 2x P40 used in ebay and run on that, older gen but still can run the 70B with 6-7 token per second if you are ok with that.
1 Like