Running DeepSeek R1 Locally
As I’m both unable and unwilling to procure an RTX 5090, I ran DeepSeek R1 on my humble 4090 using koboldcpp. Specifically:
- DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf and;
- DeepSeek-R1-Distill-Qwen-32B-Q4_K_M.gguf
Both using 20k context. On the 32B side, this meant about 25% of the 19GB was offloaded to CPU RAM.
32B responses looked totally legit, but @ 5t/s, whereas 8B was notably worse but @ 50t/s. Even with the faster speed, the lack of function calling is a huge impediment, requiring a bunch of copy/pasting to and from an IDE which combined with trying to prompt it correctly gets frustrating very quickly. DeepSeek supports function calling but with the implication you’ll need to integrate that yourself, including apis and retry-on-error etc - rather large effort considering the luxury Claude desktop offers with MCP.
Ultimately this means that DeepSeek is a fraction of the price but you would then pay in effort to get something close to MCP working. A great cost saver if you already have some application that just needs an openai-like-api with brains attached, but if you want to yolo something from nothing then expect to put in a bunch of effort.
MCP really is the killer feature here - the quickness to integrate any other thing into your workflow. Considering Anthropic made it open it should only be a matter of time before (if not labs, then the community) close the effort gap on integrating model x with mcp server y. I pondered whether this movement has already begun, and I see promising claims of MCP integration in the open source visual code extension Cline and its popular fork Roo Code.
How are you yolo’ing tool use into your workflow?