Tag: Safety
-
DeepMind built a way to measure when AI manipulates people (deepmind.google)AI · · March 26, 2026
-
Reward Hacking: Why Better Models Game You More (lilianweng.github.io)AI · · November 28, 2024
-
Crash Testing GPT-4: The First Dangerous-Capability Eval (asteriskmag.com)AI · · June 1, 2023
-
A Field Guide to the AI Safety Camps (asteriskmag.com)AI · · June 1, 2023