I’ll give my very opinionated answer and assume you’re interested in more recent GenAI and LLM stuff. I’m a devops/sysadmin/robotics/software engineer generalist that is funemployed right now, and this is how I kick the tires on the new coolness and learn a lot along the way.
Setup
- Do your development on Linux or (Windows Subsystem for Linux if you must). Running this stuff comes down to just basic python venv dependency management, compiling c programs and OS package management, and a little concurrent networking fun.
- You’ll likely need 24GB VRAM to try out many of the local models (smaller models are possible on less)
- If you only want to consume 3rd party APIs, you could do that (e.g. claude/openai/grok or groq or whatever/blahblah) too. I’m aware they exist but almost exclusively run models locally hosting for myself and friends.
Image Diffusion
- Check out ComfyUI for a python server self-host a visual web app for connecting many models and tools together visually to create complex workflows.
- Best model to run at home currently is FLUX.1-dev available on civit.ai. You can also run older stable diffusion models and there are others coming out still. There are a million bizarre LoRAs too for any niche interests lmao…
LLMs
# 30~35 tok/sec or so generation speed pgood
# more advanced options for running parallel batched inferencing too
./llama-server \
--model "../models/bartowski/Qwen2.5-32B-Instruct-GGUF/Qwen2.5-32B-Instruct-Q3_K_M.gguf" \
--n-gpu-layers 65 \
--ctx-size 8192 \
--cache-type-k f16 \
--cache-type-v f16 \
--threads 16 \
--flash-attn \
--mlock \
--n-predict -1 \
--host 127.0.0.1 \
--port 8080
- I just discovered aphrodite yesterday and have it running 4-bit AWQ quants in batch size of ~5 concurrent requests hitting over 60 tok/sec in aggregate out of the box. Fairly easy to install and try it out e.g.
mkdir aphrodite && cd aphrodite
# setup virtual environment
# if errors try older version e.g. python3.10
python -m venv ./venv
source ./venv/bin/activate
# optional use uv pip
pip install -U aphrodite-engine hf_transfer
export HF_HUB_ENABLE_HF_TRANSFER=1
# it auto downloads models to ~/.cache/huggingface/
aphrodite run Qwen/Qwen2.5-32B-Instruct-AWQ \
--enforce-eager \
--gpu-memory-utilization 0.95 \
--max-model-len 4096 \
--dtype float16 \
--host 127.0.0.1 \
--port 8080
If you want LLMs on “easy mode”, check out LMStudio, koboldcpp, and other pre-build binary inference apps. They also provide an OpenAI API compliant endpoint you can plug into anything that alrady consumes chatgpt etc.
Misc
There are also impressive vision models that can OCR handwritten images or describe a photo fairly well like Qwen2-VL-7B-Instruct.
You can quite accurately extract text from podcasts/youtube videos using whisper-ai
There are some open text to speech models like parler-tts with okay-ish voices, but have to be careful in how you batch your generations or it goes off the rails lol… Then you can take your TTS output and run it through an RVC v2 to improve the quality…
Conclusion
That’s enough for now, but you get the picture.
Just create a new project folder or git repo and python virtual environment for each toy you want to play with. Then go through the README and see if you can get it to run.
Once you have a decent LLM going you can ask it questions, but honestly the bots are not “there yet” imo, but occasionally useful similar to a web search.
Have fun on your journey of exploration, and keep us posted what you decide to try out!