โ† All playbooks

Run DeepSeek V3 on a Mac Studio

The only single-machine path to running 671B-parameter DeepSeek V3 quietly at home.

  1. Step 1

    Get an M2 Ultra 192 GB or M4 Ultra 256 GB

    DeepSeek V3 at Q4 is ~340 GB. Even on Apple's biggest unified memory machines, you will not fit Q4 in 75% of 192 GB (=144 GB). You need Q3 or aggressive quantization, or the new M4 Ultra 256 GB which gives 192 GB usable.

  2. Step 2

    Use llama.cpp Metal

    MLX is improving but llama.cpp Metal still has the most polished MoE inference path on macOS. Pull a GGUF at Q3_K_M or IQ2_M.

  3. Step 3

    Expect ~10-15 tok/s

    Even with 1092 GB/s on M4 Ultra, the ~37B active params per token put you in the 10-15 tok/s range at best. Quiet, useful, not GPT-4 fast.

  4. Step 4

    Plug into a real client

    llama.cpp exposes an OpenAI-compatible HTTP server. Point Open WebUI, Cursor, or your code agents at localhost:8080. Now you have a frontier-tier model running on a desk machine.