Running DeepSeek R1 Locally on Mac
Running DeepSeek R1 Locally on Mac
“The future is already here — it’s just not evenly distributed.” — William Gibson
In this guide, I’ll walk you through the process of downloading and running DeepSeek R1 Distilled locally on your Mac. Whether you’re using an M1, M2, M3 or the latest M4 chip, I’ll cover which distilled version works best based on your hardware and memory configuration.
For more information about DeepSeek R1, check out the official GitHub repository.
📝 Overview
Running large language models like DeepSeek R1 locally can ensure safety and reduce reliance on external APIs. In this post, we’ll cover:
- System Requirements: What you need to get started.
- Installation Steps: Setting up your environment using Ollama.
- Model Selection: Which distilled version is best for your Mac’s hardware.
- Running Inference: Testing the model with sample input.
- Enhancing the GUI: Improve the user interface with Page Assist.
- Troubleshooting: Common issues and how to resolve them.
🚀 System Requirements
Before diving into the installation, ensure your Mac meets these requirements:
- macOS: Version 12.3 or later
- Chip: Apple Silicon (M1, M1 Pro/Max/Ultra,…, M4,M4 Pro/Max )
- RAM: Minimum 8 GB (16 GB recommended for larger models)
- Ollama: Required to run DeepSeek models locally
- Google Chrome (Optional): For enhanced UI with Page Assist
📥 Installation Steps
1. Download and Install Ollama
Ollama is a lightweight tool that simplifies running large language models locally. Download it from the official website.
Once downloaded, follow the installation instructions to set it up on your Mac.
2. Download DeepSeek R1 Models
Note: These are not R1 version. They are built on Llama or Qwen.
After installing Ollama, open your Terminal and run the following command to download and run the 1.5/7/8/14/32/70B version of DeepSeek R1:
3. Running Interface
Once the model is downloaded, you can start interacting with it directly from the Terminal:
1 | ollama run deepseek-r1:1.5B |
4. 🖥️ Enhancing the GUI with Page Assist
For a better user experience, especially if you prefer a graphical interface, I recommend installing Page Assist, a Chrome extension designed to work seamlessly with Ollama.
Download Page Assist from the Chrome Web Store
Page Assist.
Download the extension from Chrome.
Enjoy a clean, user-friendly interface for interacting with DeepSeek R1.
⚡ Model Selection Based on Apple Chip & Memory
Choosing the right model version ensures optimal performance. Here’s what we recommend:
Apple Chip | RAM | Recommended Model | Ollama Command | Notes |
---|---|---|---|---|
M1 (8-core GPU) | 8 GB | DeepSeek R1-Distilled Small | ollama run deepseek-r1:1.5B |
Lightweight, optimized for smooth usage |
M2/M3 | 16 GB | DeepSeek R1-Distilled Base | ollama run deepseek-r1:7B |
Balanced performance |
M4 | 16 GB | DeepSeek R1-Distilled Base | ollama run deepseek-r1:8B |
Efficient, supports faster inference |
M1/M2/M3/M4 Pro | 16–32 GB | DeepSeek R1-Distilled Large | ollama run deepseek-r1:14B |
High throughput for complex tasks |
M1/M2/M3/M4 Max | 32–64 GB | DeepSeek R1 Performing Model 32B | ollama run deepseek-r1:32B |
Best for research and heavy workloads |
M1/M2/M4 Ultra | 64-192 GB | DeepSeek R1 Ultra Model 70B | ollama run deepseek-r1:70B |
Best for top-tier workload |
📦 Available DeepSeek R1 Distilled Models
Here are the available DeepSeek R1 distilled models along with their base models and download links via Hugging Face:
Model | Base Model | Download |
---|---|---|
DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B | 🤗 Hugging Face |
DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-Math-7B | 🤗 Hugging Face |
DeepSeek-R1-Distill-Llama-8B | Llama-3.1-8B | 🤗 Hugging Face |
DeepSeek-R1-Distill-Qwen-14B | Qwen2.5-14B | 🤗 Hugging Face |
DeepSeek-R1-Distill-Qwen-32B | Qwen2.5-32B | 🤗 Hugging Face |
DeepSeek-R1-Distill-Llama-70B | Llama-3.3-70B-Instruct | 🤗 Hugging Face |
📌 Note: To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with “
\n” at the beginning of every output.
Distilled Model Evaluation
Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces Rating |
---|---|---|---|---|---|---|
GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 |
Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 |
o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | 1820 |
QwQ-32B-Preview | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 |
DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 |
DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 |
DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 |
DeepSeek-R1-Distill-Qwen-32B | 72.6 | 83.3 | 94.3 | 62.1 | 57.2 | 1691 |
DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 |
DeepSeek-R1-Distill-Llama-70B | 70.0 | 86.7 | 94.5 | 65.2 | 57.5 | 1633 |
🛠️ Troubleshooting Common Issues
Ollama Installation Issues:
Ensure you’re running macOS 12.3 or later.
Check that Rosetta 2 is installed for compatibility with certain apps.
Memory Errors:
If you experience crashes, switch to a smaller model version (e.g., from 32B to base or small).
Close unnecessary applications to free up RAM.
Performance Issues:
Use Activity Monitor to track CPU and memory usage.
Consider upgrading to a model optimized for your chip.
Page Assist Connection Problems:
Make sure Ollama is running before launching Page Assist.
Refresh the extension if it doesn’t detect your local server.