DeepSeek Deep Dive R1 at Home!

Thank you, I have 1024GB of main memory, that is not the issue at the moment. However because of the availability of new GPU’s I am still stuck with 16GB Nvidia card. Hopefully I can get one or more Intel B60’s at the end of the year but until then I need to make due.

1 Like

Ironically the big DeepSeek models are more efficient on context use than most smaller models using GQA, as MLA (multi-head latent attention) increases linearly instead of exponentially. So yeah I recommend my IQ2_K_R4 for your setup. In my limited testing it still gives better perplexity than the slightly larger UD-Q2_K_XL unsloth quant.

I’m excited at the prospect of cheap VRAM, but unsure of immediate compatibility as I know most llama.cpp forks focus on CUDA specific kernel implementations.

I really hope this easily translates onto Intel GPUs some how!

1 Like

Yes these concerns are valid though to me it is obvious that Intel pushes these cards as AI workstation cards so I hope that they’ll put in the effort to either send patches for common software themselves or at least make it easy for developers to adjust their software for it. That’s exactly what AMD still fails at even though they teased a new 32GB workstation card as well.

1 Like

I am running the full deepseek-v3:671b-q8_0 at the moment on my CPU with 24 cores / 48 threads and it is slow as hell but I am amazed how good the answers are it creates. But I need to speed this up to make it more usable, that’s why I asked about your quants for the GPU earlier.

2 Likes

Fixed it, probably by installing gcc and g++ 14. It’s the only thing I remember trying the next day (apart from “git pull”). Maybe that’s what ik_llama.cpp needs to run normally (not stopping until response is complete). Even with previous compiler, llama.cpp ran as expected, so it was hard to imagine if compiler version or lack of g++ was affecting ik_llama.cpp.

Liking ik_llama.cpp for its performance over llama.cpp! Will try out your quants and unsloth’s more.

1 Like

Not sure if its llama.cpp or cuda 12.9.1 but I got a free 10-15% performance boost this morning. Thanks whoever contributed to that performance gain.

2 Likes