How AI Agents Work — Tool Use, Reasoning, and the Action Loop

How AI Agents Work — Tool Use, Reasoning, and the Action Loop

2026-03-24

A language model generates text. An AI agent generates text AND takes actions. It reads a task, decides which tools to use, executes them, observes the results, and decides what to do next. It loops until the task is complete.

Claude Code is an agent. Cursor's AI is an agent. GitHub Copilot Workspace is an agent. They don't just answer questions — they write files, run commands, search codebases, create pull requests. The LLM is the brain. MCP tools are the hands.

The Agent Loop

Every agent follows the same pattern:

Think Act Observe Done? loop until task is complete Reason about the task Call a tool Read the result Decide
  1. Think — the model reads the conversation (task + previous results) and reasons about what to do next
  2. Act — the model generates a tool call (or multiple tool calls)
  3. Observe — the tool result comes back and is added to the conversation
  4. Decide — is the task done? If yes, respond to the user. If no, loop back to Think.

This is the ReAct pattern (Reasoning + Acting). The model interleaves reasoning with action. It's not pre-planned — the model decides each step based on what it observes.

What Makes This Different from Chat?

A chatbot generates one response per message. An agent generates multiple steps per task:

Chat: User asks "What tables are in my database?" → Model responds with text (guessing or using pre-loaded context).

Agent: User asks "What tables are in my database?" → Model calls db_list_tables tool → observes the result → generates a formatted response based on actual data.

Agent (multi-step): User asks "Add an email column to the users table" → Model calls db_get_schema to understand current structure → calls db_run_migration to add the column → calls db_get_schema again to verify → responds confirming the change.

The agent loop is what makes AI actually useful for real work. Without it, the model is limited to what's in its training data and conversation context.

Tool Selection

The model has access to a registry of available tools (discovered via MCP). For each step, it must decide:

  1. Which tool to call — based on tool names and descriptions
  2. What arguments to pass — based on the input schema and conversation context
  3. Whether to call a tool at all — sometimes the answer is already known

Tool selection is the model's most important decision. A wrong tool wastes a step. Wrong arguments waste a step. Calling a tool when the answer is already available wastes time and tokens.

Good tool descriptions are the primary lever for improving tool selection. The model reads the description field of every available tool and picks the one that best matches the current need.

Context Window

The agent's memory is the context window — the conversation history including all previous messages, tool calls, and results. Everything the agent knows is in this window.

The problem: context windows are finite. Every tool call adds to the conversation (the request + the result). After many steps, the context fills up. When it's full, earlier information is lost.

This is the fundamental limitation of current agents. They work well for tasks that complete in 10-20 tool calls. Tasks requiring hundreds of steps — or tasks that span multiple sessions — exceed the context window.

Context management is the engineering discipline of making agents effective despite this limitation.

Error Recovery

Agents fail. Tools return errors. API calls time out. The model generates wrong arguments. The difference between a useful agent and a frustrating one is error recovery.

A good agent:

  • Reads the error message and adjusts its approach (different arguments, different tool)
  • Retries with backoff for transient failures (network timeouts, rate limits)
  • Falls back to alternative approaches when a tool consistently fails
  • Reports honestly when it can't complete the task, explaining what went wrong

A bad agent:

  • Retries the same failing call in a loop
  • Ignores errors and continues with incorrect assumptions
  • Claims success when the task failed

Error recovery is emergent from the model's reasoning ability. Better models recover better. But the tool descriptions and error messages are what give the model the information it needs to recover.

Multi-Tool Coordination

Complex tasks require multiple tools working together:

Sequential: Read file → modify → write file → run tests → check results. Each step depends on the previous.

Parallel: Search for a pattern in three directories simultaneously. Independent operations that don't depend on each other.

Conditional: Try approach A. If it fails, try approach B. The next step depends on the result of the current step.

MCP supports all of these. The model decides the execution strategy based on its understanding of the task. Some host applications support parallel tool calls (multiple tools/call requests in flight simultaneously). Others execute sequentially.

Agent Harness vs Agent Model

The agent model is the LLM — Claude, GPT-4, Gemini. It does the thinking and tool selection.

The agent harness is the software around the model — the host application that manages the loop, routes tool calls to MCP servers, manages the context window, handles errors, and presents results to the user. Claude Code, Cursor, VS Code Copilot — these are harnesses.

The harness determines:

  • Which tools are available (MCP server configuration)
  • How tool calls are confirmed (automatic vs user approval)
  • How context is managed (summarization, truncation)
  • How errors are handled (retry policies, timeouts)
  • How results are presented (streaming, formatting)

A good harness makes any model more effective. A bad harness wastes a good model's capabilities.

The Security Model

Agents act in the real world. They modify files, send messages, create infrastructure. The security model:

Human in the loop — the user should be able to see and approve what the agent does. MCP is designed for this: the host shows tool calls, the user confirms.

Least privilege — the agent should only have access to tools it needs for the current task. A writing assistant doesn't need rm -rf access.

Audit trail — every tool call should be logged. What was called, when, with what arguments, what result. This is the event stream of the agent's actions.

Reversibility — prefer reversible actions. Create a branch instead of committing to main. Create a draft PR instead of merging. The agent should default to safe operations.

Next Steps

Prerequisites