Tag: Reinforcement Learning

One transformer layer can match full RL fine-tuning (arxiv.org)

AI · 1 week ago · July 3, 2026
How frontier post-training got complicated, and why distillation now sits at the center (interconnects.ai)

AI · 3 weeks ago · June 20, 2026
OpenEnv tries to give the open agent stack a common environment ABI (huggingface.co)

AI · 1 month ago · June 9, 2026
Async RL gets cheap to ship when 99% of weights do not change (huggingface.co)

AI · 1 month ago · May 27, 2026
Fix the inference engine before you patch the RL objective (huggingface.co)

Engineering · 2 months ago · May 6, 2026
Reward Hacking: Why Better Models Game You More (lilianweng.github.io)

AI · 1 year ago · November 28, 2024