Running DeepSeek R1 Locally on Mac

“The future is already here — it’s just not evenly distributed.” — William Gibson

In this guide, I’ll walk you through the process of downloading and running DeepSeek R1 Distilled locally on your Mac. Whether you’re using an M1, M2, M3 or the latest M4 chip, I’ll cover which distilled version works best based on your hardware and memory configuration.

For more information about DeepSeek R1, check out the official GitHub repository.

📝 Overview

Running large language models like DeepSeek R1 locally can ensure safety and reduce reliance on external APIs. In this post, we’ll cover:

  1. System Requirements: What you need to get started.
  2. Installation Steps: Setting up your environment using Ollama.
  3. Model Selection: Which distilled version is best for your Mac’s hardware.
  4. Running Inference: Testing the model with sample input.
  5. Enhancing the GUI: Improve the user interface with Page Assist.
  6. Troubleshooting: Common issues and how to resolve them.

🚀 System Requirements

Before diving into the installation, ensure your Mac meets these requirements:

  • macOS: Version 12.3 or later
  • Chip: Apple Silicon (M1, M1 Pro/Max/Ultra,…, M4,M4 Pro/Max )
  • RAM: Minimum 8 GB (16 GB recommended for larger models)
  • Ollama: Required to run DeepSeek models locally
  • Google Chrome (Optional): For enhanced UI with Page Assist

📥 Installation Steps

1. Download and Install Ollama

Ollama is a lightweight tool that simplifies running large language models locally. Download it from the official website.

Once downloaded, follow the installation instructions to set it up on your Mac.

2. Download DeepSeek R1 Models

Note: These are not R1 version. They are built on Llama or Qwen.

After installing Ollama, open your Terminal and run the following command to download and run the 1.5/7/8/14/32/70B version of DeepSeek R1:

3. Running Interface

Once the model is downloaded, you can start interacting with it directly from the Terminal:

1
2
3
4
5
6
ollama run deepseek-r1:1.5B
ollama run deepseek-r1:7B
ollama run deepseek-r1:8B
ollama run deepseek-r1:14B
ollama run deepseek-r1:32B
ollama run deepseek-r1:70B

4. 🖥️ Enhancing the GUI with Page Assist

For a better user experience, especially if you prefer a graphical interface, I recommend installing Page Assist, a Chrome extension designed to work seamlessly with Ollama.

Download Page Assist from the Chrome Web Store
Page Assist.

Download the extension from Chrome.

Enjoy a clean, user-friendly interface for interacting with DeepSeek R1.

⚡ Model Selection Based on Apple Chip & Memory

Choosing the right model version ensures optimal performance. Here’s what we recommend:

Apple Chip RAM Recommended Model Ollama Command Notes
M1 (8-core GPU) 8 GB DeepSeek R1-Distilled Small ollama run deepseek-r1:1.5B Lightweight, optimized for smooth usage
M2/M3 16 GB DeepSeek R1-Distilled Base ollama run deepseek-r1:7B Balanced performance
M4 16 GB DeepSeek R1-Distilled Base ollama run deepseek-r1:8B Efficient, supports faster inference
M1/M2/M3/M4 Pro 16–32 GB DeepSeek R1-Distilled Large ollama run deepseek-r1:14B High throughput for complex tasks
M1/M2/M3/M4 Max 32–64 GB DeepSeek R1 Performing Model 32B ollama run deepseek-r1:32B Best for research and heavy workloads
M1/M2/M4 Ultra 64-192 GB DeepSeek R1 Ultra Model 70B ollama run deepseek-r1:70B Best for top-tier workload

📦 Available DeepSeek R1 Distilled Models

Here are the available DeepSeek R1 distilled models along with their base models and download links via Hugging Face:

Model Base Model Download
DeepSeek-R1-Distill-Qwen-1.5B Qwen2.5-Math-1.5B 🤗 Hugging Face
DeepSeek-R1-Distill-Qwen-7B Qwen2.5-Math-7B 🤗 Hugging Face
DeepSeek-R1-Distill-Llama-8B Llama-3.1-8B 🤗 Hugging Face
DeepSeek-R1-Distill-Qwen-14B Qwen2.5-14B 🤗 Hugging Face
DeepSeek-R1-Distill-Qwen-32B Qwen2.5-32B 🤗 Hugging Face
DeepSeek-R1-Distill-Llama-70B Llama-3.3-70B-Instruct 🤗 Hugging Face

📌 Note: To ensure that the model engages in thorough reasoning, we recommend enforcing the model to initiate its response with “\n” at the beginning of every output.


Distilled Model Evaluation

Model AIME 2024 pass@1 AIME 2024 cons@64 MATH-500 pass@1 GPQA Diamond pass@1 LiveCodeBench pass@1 CodeForces Rating
GPT-4o-0513 9.3 13.4 74.6 49.9 32.9 759
Claude-3.5-Sonnet-1022 16.0 26.7 78.3 65.0 38.9 717
o1-mini 63.6 80.0 90.0 60.0 53.8 1820
QwQ-32B-Preview 44.0 60.0 90.6 54.5 41.9 1316
DeepSeek-R1-Distill-Qwen-1.5B 28.9 52.7 83.9 33.8 16.9 954
DeepSeek-R1-Distill-Qwen-7B 55.5 83.3 92.8 49.1 37.6 1189
DeepSeek-R1-Distill-Qwen-14B 69.7 80.0 93.9 59.1 53.1 1481
DeepSeek-R1-Distill-Qwen-32B 72.6 83.3 94.3 62.1 57.2 1691
DeepSeek-R1-Distill-Llama-8B 50.4 80.0 89.1 49.0 39.6 1205
DeepSeek-R1-Distill-Llama-70B 70.0 86.7 94.5 65.2 57.5 1633

🛠️ Troubleshooting Common Issues

Ollama Installation Issues:

Ensure you’re running macOS 12.3 or later.
Check that Rosetta 2 is installed for compatibility with certain apps.
Memory Errors:

If you experience crashes, switch to a smaller model version (e.g., from 32B to base or small).
Close unnecessary applications to free up RAM.
Performance Issues:

Use Activity Monitor to track CPU and memory usage.
Consider upgrading to a model optimized for your chip.
Page Assist Connection Problems:

Make sure Ollama is running before launching Page Assist.
Refresh the extension if it doesn’t detect your local server.