Ai2's open robotics model beats a proprietary baseline

AI · May 17, 2026 · 13 hours ago · source (allenai.org)

The Allen Institute for AI has released MolmoAct 2, a robotics foundation model, with the weights, training data, and code all open. The pitch is that it reasons about a scene in 3D before it acts, and it handles single-arm and two-arm manipulation without per-task fine-tuning. The vision-language core, Molmo 2-ER, was trained on about 3 million extra embodied-reasoning examples and averages 63.8 out of 100 across 13 spatial-reasoning benchmarks, ahead of GPT-5, Gemini 2.5 Pro, and Qwen3-VL-8B on that set.

The numbers Ai2 reports are specific. On real-world zero-shot tests with a Franka arm, MolmoAct 2 hits 87.1 percent success, against 45.2 percent for Physical Intelligence's proprietary π0.5. In simulation on the MolmoBot household benchmark it roughly doubles π0.5, 20.6 percent to 10.3. It also runs much faster: about 180 milliseconds per action versus 6,700 for the original MolmoAct, which Ai2 frames as up to 37 times quicker. A depth-token mechanism only runs the heavier 3D reasoning when it expects a payoff, saving 17 percent over always predicting depth.

Ai2 also shipped the MolmoAct 2-Bimanual YAM dataset, which it calls the largest open bimanual tabletop manipulation set so far, with more than 720 hours of demonstrations. That, plus the open weights and the training recipe, is the part worth noting for anyone who cannot train a robot policy from scratch.

Why it matters

If you work on robot manipulation, an open model that beats a strong proprietary baseline by a wide margin and ships its data and recipe means you can build on it directly instead of collecting hundreds of hours of demonstrations yourself.

Open Models Allen Institute Robotics