Anyone working with agents runs into this. MCP was a great first step as a protocol, but there's a trade-off: every tool definition loads upfront, consuming tokens before the model starts any real work. Context windows fill up quickly, and performance takes a hit.
Opus 4.5 addresses this with three upgrades:
- Tool Search: The model pulls tool definitions on demand rather than loading everything at once. Context windows stay cleaner, responses are faster, and you're not wasting tokens on unused tools. (By the way, it’s very meta — the word, not the company — that Anthropic gave us a tool to search/discover other tools).
- Programmatic Tool Calling: Rather than routing every intermediate step back through the model's context, Claude writes code to chain tools together directly. Less like an assistant needing constant updates, more like a developer who understands how to work with APIs. I think a bunch of people were already having this kind of idea (including Anthropic themselves). LLMs are great, but sometimes, introducing layers of abstraction that require using a LLM to make a function call is just not worth it. Let things that can be deterministic (and fast), be deterministic (and fast).
- Tool Use Examples: You can demonstrate how to call a tool properly, not just list available parameters. The model follows your examples instead of interpreting your schema in its own creative way.
the Gemini 3 capability that stands out to me as the big differentiator is how well it works with large context windows (it supports up to 1M tokens like the prior version). So, this could be really useful for agentic coding use cases.
In sum, where Gemini 3 clearly wins is context windows, multimodal capabilities, and generative UI. If you need a model that handles video, images, and text together while reasoning across all of it -- and generates interactive experiences in real-time -- Gemini 3 is compelling.
At a high level, here’s where the AI models stand after these releases:
• Claude Opus 4.5 is focusing on better agent orchestration and tool use -- making complex agentic workflows more reliable. • Gemini 3 is the broad winner, innovating on speed and multimodal understanding -- making AI fast enough for real-time use across text, images, and video. • GPT-5.1 Pro is best for deep reasoning -- making AI capable of handling problems that require genuine expertise.
