Code Execution Cuts MCP Agent Token Costs

AI · November 4, 2025 · 6 months ago · source (anthropic.com)

When an agent connects to many tools through the Model Context Protocol, the naive approach loads every tool definition into context and routes every intermediate result back through the model. Anthropic's engineering team argues that is the expensive mistake. Their alternative is to expose MCP servers as a filesystem of code, for example a file per tool that the agent reads on demand, and let the agent write code that calls those tools and keeps bulk data inside the execution sandbox instead of passing it through the prompt. The worked example is a Google Drive to Salesforce transfer: the direct tool-calling version moves about 150,000 tokens through context, the code-execution version about 2,000, which they put at a 98.7 percent reduction, mostly by never passing a large transcript through the model twice. The post is honest about the cost. You now need a secure sandbox with resource limits and monitoring, which is operational overhead that plain tool calls do not have.

Why it matters

For tool-heavy agents, context bloat is a real bill and a real latency source. This is a concrete pattern, with a measured before and after, for cutting both, as long as you can run code safely. The tradeoff is sandboxing work you should price in.

Anthropic Agents