Local AI LLM’s gpu?

Hey i am interested in buying an nvidia gpu to use for generating stuff with local LLM’s. Would a 16gb gpu be enough yo grnerate 3d models on and or generate 2d animations / 2d images on?
Cheers

2 Likes

I have 2080TI with 11Gb memory and I have used local AI’s to generate images and also used some local LLM models. There are some settings that you gan tweak but size of the model matters. I can’t use the newest and largest image generation models anymore with that memory. There is never enough memory either.

Software recommendations:

  • ComfyUI for image generation
  • LM Studio for local LLM

Cheers dude but damn, is 16gb really not enough?

There are always some models that you can’t run but it’s enough to use many models. Realistically that is in the range you can have and is expected to have at home. There’s lots you can do with it.

There are some insane models that aren’t meant to be run on home PCs and which need a lot of ram, like for example Llama 3.1 70B model:

Estimated RAM: Around 350 GB to 500 GB of GPU memory is typically required for running Llama 3.1 70B on a single GPU, and the associated system RAM could also be in the range of 64 GB to 128 GB. This is highly dependent on the batch size and model implementation specifics.

It’s more than nothing but look at the file sizes of models at ollama.com – if it’s bigger than your GPU’s RAM then it’ll have to page in from main RAM or slower storage. Some models are very effective without being tens of gigabytes (and some can be quantized into smaller representations without losing much capability).

If you can accept it being slow, go for the biggest and most-reliable model you can download. Otherwise, you get a classic trade-off.

K3n.

1 Like

Thanks dudes youve taught me something today

The above was explained very well, but putting it in more practical terms, 16GB should allow you to run stable diffusion without issues, for flux you’ll need to run a quantized model, but this one should go without issues as well.

For LLMs you can run ~25B-ish models at 8-bit quants, or ~50B ones with 4-bit quants.