A Build Order for a Production GenAI Platform

AI · July 25, 2024 · 1 year ago · source (huyenchip.com)

Chip Huyen's piece is a reference architecture for a production generative AI platform, and its value is the order, not just the box diagram. She builds it up one layer at a time. Start with context construction: retrieval, text-to-SQL, query rewriting, the part that decides what the model actually sees. Add guardrails next, both input checks for things like PII and prompt injection and output checks for format, toxicity, and hallucination, with fallback logic. Then a model router and gateway: intent classifiers that send requests to the right model behind one API, with cost and load control. Then caching, where she notes prompt caching can cut cost substantially, alongside exact and semantic caches that match on embedding similarity. Only after that come complex agentic logic and write actions, the riskiest layer, last on purpose. Observability with metrics, logs, and traces runs across all of it, and orchestration is deliberately deferred until the pieces exist.

Why it matters

If you are standing up LLM infrastructure, this gives you a sequence to follow instead of bolting on parts in a panic. The explicit ordering, context and guardrails before agents and write actions, is the practical advice: the dangerous, expensive layers come last for a reason.

Engineering LLMs