AI Engineering FAQ

What is MCP and why does it matter?

MCP (Model Context Protocol) is the open standard for connecting AI applications to external tools, data sources, and workflows. Before MCP, every AI integration was custom — connecting to GitHub required GitHub-specific code, connecting to Slack required different code, and none of it was reusable across AI applications.

MCP solves this the way USB solved peripheral connectivity. Build one MCP server, and every compatible AI client can use it. Build one MCP client, and it can use every compatible server. The protocol uses JSON-RPC messages over stdio or HTTP transports, and defines three primitives: tools (actions), resources (data), and prompts (templates).

See How MCP Works for the full architecture.

What is the difference between MCP tools and resources?

Tools are model-controlled — the AI decides when to call them and what arguments to pass. Tools perform actions: running queries, creating files, sending messages. Resources are application-controlled — the host application decides which resources to attach to a conversation. Resources provide context: file contents, database schemas, documentation.

The distinction matters for security and predictability. Tools require the AI to reason about when to act. Resources let the application pre-load relevant context before the AI even starts reasoning. In practice, an MCP server typically exposes both — tools for actions the AI should take and resources for data the application should provide upfront.

See How MCP Tools Work and How MCP Resources Work for the detailed lifecycle of each.

How does an AI agent decide which tool to use?

An AI agent receives tool descriptions and input schemas as part of its context. When the user asks a question, the agent reads the available tools, matches them against the task, and selects the most relevant one. This is tool use — the model outputs a structured call with a function name and arguments.

The decision is based on the tool's description (what it does), the input schema (what arguments it accepts), and the current conversation state (what has already been tried). If the first tool call does not fully answer the question, the agent loops — observing the result, reasoning about what to try next, and calling another tool. Good tool descriptions are critical. Vague descriptions lead to wrong tool choices.

See How AI Agents Work for the full agent loop and error recovery strategies.

What is the context window and why does it matter?

The context window is the fixed-size buffer of tokens the AI can process at once. It includes the system prompt, conversation history, attached documents, tool results, and the model's own responses. Current models range from 128K to 200K tokens. Everything outside the window is invisible to the model — it has no memory beyond what fits.

For AI agents running multi-step tasks, context management is the central engineering challenge. Each tool call adds input and output. A 20-step debugging session can consume the entire window. Strategies include summarization (compressing older exchanges), selective injection (RAG to retrieve only what is relevant), and MCP resources for structured context loading.

See How Context Management Works for practical strategies.

What is RAG?

RAG (Retrieval-Augmented Generation) searches external data at query time and injects the results into the AI's context window before the model generates a response. This lets the AI answer questions about private data, recent events, or anything not in its training set.

The process: take the user's question, search a knowledge base (using BM25, semantic search, or hybrid search), retrieve the most relevant documents, inject them into the prompt, and let the model reason over the actual data. RAG is cheaper and more maintainable than fine-tuning — update the search index, and new information is available instantly.

See How Context Management Works for how RAG fits into broader context strategies.

What is the difference between stdio and HTTP transport in MCP?

MCP transports carry JSON-RPC messages between client and server. stdio launches the MCP server as a local child process and communicates through stdin/stdout pipes. No network, no ports, no authentication needed. Streamable HTTP connects over the network using POST requests and Server-Sent Events for streaming.

Use stdio for local development tools — file system access, local databases, CLI wrappers. Use HTTP for shared infrastructure — team tools, cloud services, servers that multiple clients access concurrently. The transport is invisible to the protocol layer — the same tools and resources work identically over either transport. You can develop locally on stdio and deploy remotely on HTTP without changing any logic.

See How MCP Transports Work for message framing details and deployment patterns.

How do I build an MCP server?

Pick an SDK for your language — official SDKs exist for TypeScript, Python, Java, Kotlin, and C#, with community SDKs for Rust, Go, Ruby, and others. Define tools as handler functions with a name, description, input schema, and implementation. Define resources as URI-addressable data sources. Choose a transport (stdio for local use, HTTP for remote).

A minimal MCP server with one tool can be under 50 lines of code. The SDK handles JSON-RPC framing, capability negotiation, and message routing. You write the tool logic — the SDK handles the protocol.

See How MCP Servers Work for the full server lifecycle from initialization through request handling.

Can I use MCP with any AI model?

Yes. MCP is model-agnostic. The protocol defines how tools, resources, and prompts are exposed — not which model consumes them. Any model that supports tool use (function calling) can work with MCP through a compatible client. This includes Claude, GPT-4, Gemini, Llama, Mistral, and other models.

The MCP client translates between the model's native tool-use format and MCP's standardized protocol. An MCP server does not need to know which model is calling it. It receives JSON-RPC requests, executes them, and returns results. The model-specific translation happens entirely in the client layer.

See How MCP Clients Work for how clients bridge the gap between models and MCP servers.