I been looking at so many posts now, but Did Wendel make a how-to , for beginners for things like [
Llama for local use if so do anybody have a link ?
I’m asking because so many consultants who promote AI also suggest that cloud bases solutions run as " private " are safe and I do not agree and want to see how fast and easy it is to build a local version and if at any use at all
Hope found a way to post correctly , sorry of not first post tech me dont bite
Welcome! I get where you’re coming from—there’s definitely a lot of discussion around cloud vs. local solutions. I haven’t seen a specific beginner’s guide from Wendel on setting up Llama locally, but it might be worth checking out the general resources or tutorials section of the forum.
If you don’t find what you need, you could also try searching for community-created guides or threads discussing local installations. It can be really eye-opening to build it yourself!
Don’t hesitate to ask more questions as you dive in; we’re all here to help each other out!
Welcome! You have a great question. I run many of the open weight local LLM’s on my desktop at home. Even with a modest GPU you can get some of the smaller 8B models running at home.
How much VRAM do you have? 8GB is kind of entry level, 16GB is okay, 24 GB will get you a taste of the bigger models, dual+ GPU is the dream lol
How much RAM do you have? If you can’t fit the whole model into your VRAM, many inference engines allow you to do partial offload. The speed will be slower though.
I agree with @MonstrousMicrobe that koboldcpp is a great way to get started. If you are a software dev and have experience with python virtual environments, managing dependency hell, and compiling c code, then I’d go straight to llama.cpp as many of the other projects use it under the hood.
Once you have koboldcpp, llama.cpp, ollama, lmstudio, or whatever you choose running on your computer, then head over to hugging face and type in GGUF in the search bar. A good starter model that fits in under 6GB VRAM I recommend is bartowski/Meta-Llama-3.1-8B-Instruct-GGUF and download the Q4_K_M
Start out small and build on your successes to keep motivated. There are a bunch of models, new ones coming out every month, and certain ones have better use cases e.g. “creative” writing, math, certain programming languages, etc… I keep an eye on r/localllama for latest releases and benchmarks.
After a while you might want to play with different quants (quantization is a way to compact the model weights to fit into RAM/VRAM), inference engines (there is more than just GGUF lol), or advanced stuff like distributed inferencing hah…
Enjoy the journey, what you learn along they way might be more useful than anything the bots tell u!
Found this also, might be of value to some put this into your youtube /watch?v=DYhC7nFRL5I it runs and manages a number of models under docker in both linux and Windows
GPU is 24GB so that will work fine, did find this from " Dave" can’t post full link but Im sure most will know what to put after youtube I’m playing I think the way is local system for safety and support firms’ data /watch?v=DYhC7nFRL5I
Nice, with 24GB locally you can try out many open LLMs at home without any of your data going through a 3rd party off-site system. Do be careful if you play with ComfyUI as some of the modules are very sketchy hah…
Some other folks are discussing this in another thread on here too:
Yes, there are lot’s of YTubers and resources available. Follow whatever methods and apps look interesting and motivate you to keep plugging along!
Dave I like him; he’s direct and seems to do this for fun, with lots of fun projects. He’s a bit light on documentation at times, but that forces you to learn something, so it’s fine.