Tag: Evaluation
-
Testing AI in the open world, not just on benchmarks (normaltech.ai)AI · · May 17, 2026
-
How you pick benchmarks decides whether open models are far behind (interconnects.ai)AI · · May 16, 2026
-
One benchmark number hides which jobs a model is actually good at (interconnects.ai)AI · · April 20, 2026
-
AI · · March 17, 2026
-
Crash Testing GPT-4: The First Dangerous-Capability Eval (asteriskmag.com)AI · · June 1, 2023