DeepSeek Deep Dive R1 at Home!

eousphoros · June 2, 2025, 10:42pm

Ok final edit, think I have a pretty good max performance bar

For max token/s for prompt processing: parallelism = 1, batch = 8192, u_batch = 2048, threads = 1
For max token/s for model processing: parallelism = 4, batch = 2048, u_batch = 512, threads = 64

Unexpected result for me, the top model processing settings result in the worst prompt processing. Something to think about I guess.