Self-Hosted AI Chat with WebGUI on Linux

H-i-v-e · December 11, 2025, 3:16pm

The 4070 does not have more RAM then the 9070XT so it should not matter much.

You should be able to use any Linux distribution you’d like. Then you compile llama.cpp with the Vulkan backend yourself and you’re ready to go. As a front end I would start an OpenWebUI container locally that uses the llama.cpp backend.

You are a tad limited in which models you can run. @ubergarm here has models on Huggingface that are suited to run on both GPU + CPU but usually they are for GPUs with 24GB VRAM, so you need to see which ones fit on 16GB. You could reference the other thread.