I’m sure there’s many of us here after watching Wendell’s video on local LLMs. My question is, is there any sort of pathway for a beginner to host DeepSeek-R1 locally? Currently, my own experience hasn’t gone further then downloading Ollama and pulling the 1.5B distilled DeepSeek model on an old work laptop (MacBook Pro). What would you guys recommend the next steps would be to actually fit these large models into consumer hardware?
quick easy answer is run the ollama docker container, and download one of the deepseek models. the Ollama docker container automaticly pulls the model wheny your run it so really its as easy as:
- Install docker
- run ollama docker container telling it to run a version of deepseek.
You don’t need a gpu to do any of this, assuming you have enough system memory to hold the model your trying to run it will run on your cpu, albeit slowly. You can give the ollama container acess to a gpu if you have one, however for the model to run on it it must have enough ram to hold the model or it will default back to cpu.
in my experience the 8b paramater models are a good starting point for testing on smaller systems take about 5.5gb of memory, so they fit on a lot of consumer hardware.
if you wanna get fancy with the same ease using docker containers look at the open webui wich uses ollama to run the models
Thats a amazing guide if you don’t wanna use docker, i just find docker super simple and easy. Especialy when you decide you dont like something telling docker to trash the container means you dont have to manually uninstall and remove any files.
I would also like to promote ollama and any of the open source models. The installation is very easy and I’ve done it both on my Windows PC with a 7900 XTX, but I use it mostly on my m3 mac and pair it with vscode as a coding AI agent. The installation is similar across these systems and docker just works.
A small getcha is on the mac you need to run Ollama standalone instead of using the docker container if you want GPU access, this is a limitation with docker getting access to the apple “Metal API” (for gpu access). So you can simply install Ollama natively and then run the openwebui container with
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
If you want easy friendly ui, LM studio is the way to go
It has a user interface and model browser which you can use to figure out which models are best for your given hardware configuration
Just be careful if you use it for work, as the public version is for personal use
Otherwise ollama with the open web ui per Sean’s suggestion
Wendell wrote a Linux guide. For Mac, you just install the app, start it up, and download a model
Now if you want to build a server that can run the full model, check out this guide
This. LM Studio is excellent for selecting LLM based on the amount of GPU memory you have. WIll tell you if you can fully load the model (recommended), partially do it (slower responses, but more accurate) and if you cannot run it at all.
Then use OpenWeb UI (I’ve also used privateGPT) for local running of your LLM with no Internet connection required. Do monitor your network to make sure there is no leakage. Trust, but verify.
I now have several models for coding, writing, chatbot and language translation running locally, Flipping between them is very easy.