DeepSeek Deep Dive R1 at Home!

adman-c · February 14, 2025, 7:57pm

Using that cache-flush method, I can get 2-4 t/s running with 64 threads on Rome with an 8k context, which is very usable. Now I’m quite curious to know if my faster CPU will make any difference or if I’m memory-bandwidth-capped at this point.

./build/bin/llama-cli \
    --model ./models/unsloth/DeepSeek-R1-GGUF/DeepSeek-R1-UD-Q2_K_XL-00001-of-00005.gguf \
    --threads 64 \
    --numa distribute \
    --interactive \
    --color \
    --ctx-size 8192