OLMo 2 32B: Fully Open Catches Up to Closed
The Allen Institute for AI released OLMo 2 32B in March 2025 and made a specific claim: it is the first fully open model to outperform GPT-3.5 Turbo and GPT-4o mini on academic benchmarks. Fully open here is stricter than open weights. Ai2 released the weights, the training data, the code, and the recipe, so the result can be reproduced rather than just used. The model was trained on about 6 trillion tokens, roughly 3.9 trillion in pretraining plus an 843 billion token mid-training stage on a curated mix. The number worth noting is cost. Ai2 says OLMo 2 32B reaches performance comparable to Qwen 2.5 32B while using about one-third of the training compute, and reports throughput above 1,800 tokens per second per GPU at around 38 percent model FLOPs utilization on its training cluster. It also approaches Qwen 2.5 72B and Llama 3.1 70B, which are larger.
Why it matters
If you care about reproducible AI rather than just downloadable weights, this is the point where fully open science reached a genuinely usable tier, and did it cheaply. For researchers it means you can study a competitive model end to end, including its data, instead of probing a black box.