Ethan Mollick on GPT-5.5: strong where work is verifiable, weak where taste is the point

AI · April 23, 2026 · 3 weeks ago · source (oneusefulthing.org)

Ethan Mollick's take on GPT-5.5 is useful because he tests it on real tasks and reports the misses as plainly as the hits. His frame is that progress now comes from three things moving together: the models, the apps around them like a desktop Codex, and the harnesses that give the model tools to use.

The concrete examples carry the point. GPT-5.5 Pro built a procedurally generated 3D town that evolves from 3000 BCE to 3000 AD with actual town modeling, and finished a hard simulation task in 20 minutes against 33 for the previous version. The new image model drew "otters on planes using wifi" across Klimt, Rothko, and Matisse styles with readable labels, which earlier models could not do. Given four prompts and his own crowdfunding dataset, it produced a 101-page paper with real literature review and statistics that Mollick said he would accept from a second-year PhD project. It also generated a full tabletop RPG with rules, tables, and simulated playtests.

He is equally specific about the ceiling. Long-form fiction is still weak, with ornate sentences and characters who all talk the same way, and the auto-generated research hypothesis was statistically sound but dull to an expert. Read the full post on One Useful Thing.

Why it matters

If you are deciding where to trust this class of model, Mollick's split is the practical guide: strong on structured, verifiable production work, still unreliable where taste and originality are the product. That line is where you should keep a human.

OpenAI Models