What is a Context Window
A context window is the fixed-size buffer of tokens that an AI model can process at once. It includes everything the model sees — the system prompt, conversation history, tool results, and attached data. Anything outside the window does not exist to the model. There is no memory beyond this buffer.
How it works
Every input to an AI model is converted into tokens — roughly 3/4 of a word each. The model has a maximum token count it can process in a single request:
| Model | Context window |
|---|---|
| GPT-4o | 128K tokens |
| Claude 3.5 Sonnet | 200K tokens |
| Claude Opus | 200K tokens |
The context window contains everything: your system prompt, the conversation so far, any documents you attached, tool call results, and the model's own previous responses. As a conversation grows, older content either stays (consuming space) or gets dropped. Once the window is full, something must go.
For an AI agent running a multi-step task, context management is critical. Each tool call adds input and output to the window. A 20-step debugging session can consume the entire window, leaving no room for the actual fix.
Why it matters
The context window is the fundamental constraint of AI engineering. It determines how much information the model can reason about simultaneously. Too little context and the model makes uninformed decisions. Too much irrelevant context and the model gets distracted or hits the limit.
This is why RAG exists — to select only the most relevant information and inject it into the limited window. It is why MCP resources matter — they give applications structured control over what goes into the context. And it is why agent frameworks need summarization and pruning strategies for long-running tasks.
See How Context Management Works for strategies to work within the context window effectively.