Tag: LLM Architecture

Where a hybrid model beats a transformer, token by token (allenai.org)

AI · 2 weeks ago · June 28, 2026
A new paper gives language models a sleep phase (arxiv.org)

AI · 1 month ago · May 26, 2026
How 2026 open models buy long-context efficiency without shrinking (magazine.sebastianraschka.com)

Engineering · 2 months ago · May 16, 2026
Ai2's EMO trains a mixture of experts you can run at one-eighth size (huggingface.co)

AI · 2 months ago · May 8, 2026
DeepSeek-V4 spends most of its design budget making long context usable (huggingface.co)

AI · 2 months ago · April 24, 2026
A field guide to the attention variants modern LLMs actually use (magazine.sebastianraschka.com)

Engineering · 3 months ago · March 22, 2026
Sebastian Raschka's map of inference-time scaling for reasoning (magazine.sebastianraschka.com)

Engineering · 5 months ago · January 24, 2026