ollama and llama.cpp have simular behavior on nvidia cards as well. I ran a series of benchmarks testing with various metaparams to see what performance I could get out of it in another thread here: DeepSeek Deep Dive R1 at Home! - #153 by eousphoros
1 Like