as context windows have exploded from 4K to over 1M tokens in the last two years, something more powerful than prompt engineering has emerged: Context Engineering.
The shift may seem simple but the framing is completely different--instead of optimizing how you ask, you're now optimizing what the AI has access to when it thinks.
What is a Context Window?
Think of it as a big sheet of paper that you pass to the LLM.
Context windows have a limit, usually expressed as a number of "tokens." A token is roughly ¾ of an English word or about four characters. "ChatGPT" is two tokens: "Chat" and "GPT." This matters because billing, latency, and memory all scale with token count.
Here's a key limitation with LLMs: they only know what they were trained on and what's provided to them in the context window.
Here’s a quirky but delightful detail on how tool calling works: When we let the LLM know what tools it has available, it doesn’t actually call the tools itself. The core LLM isn’t built to do that. It just takes context in and produces output. We tell it what tools it has access to, and the output, it lets the application (like ChatGPT) which tool it would like to invoke. The app then invokes the tool on the LLMs behalf and puts the results in the context window and sends it back. It’s a clever way to work around the fact that LLMs just work with a context window.
Context Engineering is essentially a higher-level version of Prompt Engineering.
Prompt engineering was like learning to ask really good questions. Context engineering is like being a librarian who decides what books someone has access to before they even start reading.
**What Context Engineers Actually Do:**
• Curate: Decide which documents, memories, or APIs matter for each specific task • Structure: Layer system messages → tools → retrieved data → user prompt in optimal order • Compress: Summarize or chunk information to stay under token limits while preserving what matters • Evaluate: Measure accuracy and watch for "context dilution" where irrelevant info distracts the model
Keep in mind, more context means richer documents and longer conversations. But cost and latency rise roughly linearly with window length.
I'm constantly amazed by what becomes possible when you stop thinking about AI as a chatbot and start thinking about it as a reasoning engine with access to the right context and tools.