DeepSeek-R1: An Open Model Matches a Closed Reasoner

AI · January 20, 2025 · 1 year ago · source (huggingface.co)

DeepSeek released R1 in January 2025: a mixture-of-experts model with 671B total and 37B active parameters, built on DeepSeek-V3-Base, with a 128K context and an MIT license that allows commercial use and derivatives. Two results made it land hard. The first is methodological. A variant called R1-Zero showed that reasoning behavior can be induced purely through reinforcement learning, with no supervised fine-tuning step first; the full R1 then adds a small amount of cold-start data before RL to clean up the output. The second is the scoreboard. DeepSeek reports R1 at 79.8 percent on AIME 2024 against o1's 79.2, and 97.3 on MATH-500 against 96.4, while trailing o1 on Codeforces rating (2029 versus 2061) and GPQA Diamond (71.5 versus 75.7). Roughly, it matched a closed frontier reasoner on math and stayed close elsewhere. DeepSeek also shipped six distilled dense models from 1.5B to 70B, with the 32B distill beating o1-mini.

Why it matters

An openly licensed model reaching o1 level, plus evidence that pure RL can produce reasoning, reset expectations about how much money and secrecy a frontier reasoner requires. If you deploy models, the distilled 32B is the practical takeaway: strong reasoning you can run yourself.

DeepSeek Reasoning