Article I — Practitioner Foundation

A Foundation in Tool-Augmented AI Systems

A six-stage learning sequence. From the model in isolation to agents, context design, and the pattern language that waits at the end of the road.

This document is a foundation course, not a survey. It is written for practitioners who are already working with AI tools and who want to understand what they are actually working with — precisely, from the ground up. Much of the vocabulary currently in circulation is imprecise. Imprecise vocabulary produces imprecise thinking. Imprecise thinking produces systems that do not do what was intended. The foundations described here are stable; some specifics — particularly around MCP and tooling protocols — have moved since this was written. Verify current documentation before building.

The sequence has six stages. Each stage has a clear boundary: what you know going in, and what you know coming out. The stages build on each other. Skipping stages is possible but not recommended. The gaps show up later in ways that are difficult to diagnose.

The final stage points toward a longer project: a pattern language for semantic AI systems. That is the practitioner's destination. The stages before it are the road.

Estimated time per stage: two to four hours of focused study and hands-on work. Total sequence: one to three weeks at practitioner pace.

Before Stage Zero: A Vocabulary Problem

The single most common barrier to clear thinking about AI systems is the conflation of terms. The following terms are frequently used interchangeably in practice. They are not interchangeable. Each names a distinct thing.

LLM

Large Language Model. A statistical model trained to predict the next token in a sequence. This is the engine. Everything else is built on top of it.

Model

Often used as a synonym for LLM. Technically, the trained artifact — the weights — not the system that uses it.

AI / Claude / GPT

Popular names for products built on top of models. The product includes the model plus a system prompt, safety layers, tool integrations, and interface. The model alone is not the product.

Agent

A system in which a model takes a sequence of actions toward a goal, using tools, until a stopping condition is met. Requires tool use. Not every LLM interaction is agentic.

Tool use / Function calling

The capability of a model to output structured calls to external functions. The model does not execute these calls. The host system does.

MCP

Model Context Protocol. A protocol specification — like HTTP, but for AI tool access. Defines how an AI client connects to an MCP server that exposes tools, resources, or prompts. MCP is not AI. It is plumbing.

MCP Server

A process that exposes capabilities via the MCP protocol. You build these.

RAG

Retrieval-Augmented Generation. A pattern in which relevant documents are fetched from external storage and placed in the context window before the model generates a response. RAG is a design pattern, not a product.

Context window

Everything the model can see in a single inference call: the system prompt, the conversation history, tool results, and any retrieved documents. The model has no memory outside this window.

Prompt

The input to the model. Includes the system prompt (set by the developer) and the human turn (set by the user). Prompting is an engineering discipline.

Grounding

Connecting the model's output to verifiable external facts. RAG is one grounding technique. Tool use is another.

The test: can you use these terms without substituting one for another? If not, spend time here before continuing. Every stage that follows depends on this vocabulary being precise.

Stage 0

The Model in Isolation

What a language model actually does, and what it cannot do.

Token prediction

An LLM produces output one token at a time. Each token is selected probabilistically based on the input and all previously generated tokens. There is no reasoning happening in the way humans reason. There is sophisticated pattern completion happening at a scale that produces outputs that resemble reasoning. These are not the same thing, and the difference matters for system design.

The context window is everything

The model has no persistent memory. Between calls, nothing is retained. Everything the model knows during a given interaction is present in the context window. This is both the fundamental constraint and the fundamental design surface. Managing what is in the context window — and what is not — is most of the work.

Temperature and sampling

The model does not produce a single answer. It produces a probability distribution over possible next tokens and samples from it. Temperature controls how spread out that distribution is. High temperature: creative, variable, less reliable. Low temperature: conservative, consistent, occasionally rigid. This is a dial, not a switch.

What the model cannot do

Without external tools: it cannot retrieve current information, it cannot perform precise arithmetic reliably, it cannot take actions in the world, and it cannot remember previous conversations. Any system that seems to do these things is using tool use or retrieval — not the model alone.

How to complete this stage

Read the Anthropic documentation on how Claude works: docs.claude.com
Run twenty deliberate prompts and observe how output varies with temperature and phrasing
Identify three cases where you previously attributed a capability to "the AI" that actually required tool use or retrieval

Exit criterion: You can explain what a context window is, why the model has no memory between sessions, and what token sampling means — without using the words "AI thinks" or "AI knows."

Stage 1

Tool Use and Function Calling

How models reach outside the context window.

Tool use is the capability that transforms a language model from a sophisticated text predictor into something that can interact with the world. Understanding how it works — mechanically — is essential before building any system that uses it.

The mechanism

The model is given a list of available tools, each described in natural language with a defined schema for inputs and outputs. When the model determines that a tool call is appropriate, it outputs a structured message — typically JSON — specifying which tool to call and with what arguments. The host system receives this output, executes the tool call, and returns the result to the model as a new context entry. The model then continues.

The model does not execute code. It writes a request. The system executes it. This distinction matters for security, reliability, and error handling.

The tool call loop

One cycle of tool use: the model receives input and tool definitions in context; outputs a tool call; the host system executes the call and captures the result; the result is appended to the context; the model receives the updated context and continues. An agent is this loop repeated until a stopping condition is met. Each iteration consumes context space. Long agentic tasks eventually run out of context — this is a hard constraint, not a bug to be fixed.

What good tool design looks like

Tools should be designed the same way good API endpoints are designed. One tool, one concern. Clear input schemas. Predictable outputs. Idempotent where possible. Well-described in natural language — because the model reads the description to decide whether and how to use the tool.

How to complete this stage

Read the Anthropic tool use documentation
Build one tool call from scratch — a simple web search or calculator — and observe the full loop in a notebook or test script
Deliberately break a tool and observe how the model handles the error

Exit criterion: You can trace a complete tool call from model output to host execution to result, and explain where failures can occur at each step.

Stage 2

The Model Context Protocol

What MCP is — and is not.

MCP is a protocol specification. It standardizes the interface between an AI client and the servers that expose tools, resources, and prompts. It is not a framework, not a platform, and not a paradigm. It is plumbing — carefully designed plumbing, but plumbing. The protocol has continued to develop since this document was written; verify current specification details at modelcontextprotocol.io before building.

Before MCP, every AI application had to define its own bespoke interface for tool integration. MCP replaces that with a standard. The value of the standard is composability: an MCP server built for one client works with any compliant client.

MCP Client

The AI application — Claude Code, Cowork, a custom agent. Connects to servers and uses the tools they expose.

MCP Server

A process that exposes tools, resources, or prompt templates via the MCP protocol. You build these.

Tool

A callable function exposed by an MCP server. The model calls it; the server executes it.

Resource

A readable data source exposed by an MCP server. The model reads it; the server fetches it.

MCP is not the same as tool use. Tool use is a model capability that predates MCP. MCP is one way to implement tool use at scale. A system can use tool use without using MCP. A system can use MCP without using agents. These are orthogonal concepts that happen to compose well.

How to complete this stage

Read the MCP specification: modelcontextprotocol.io
Run a minimal existing MCP server locally — the filesystem server from the official examples is the right starting point
Connect it to Claude and observe the protocol in action
Read one MCP server's source code in full — understand every function

Exit criterion: You can explain MCP to a colleague without using the words "platform," "framework," or "AI." You can describe it accurately in one sentence: a protocol that standardizes how AI clients connect to servers that expose tools, resources, and prompts.

Stage 3

Building MCP Servers

From consumer to producer.

Reading about MCP is not sufficient. The understanding that comes from building a server is qualitatively different from the understanding that comes from using one. This stage is hands-on throughout.

First server: one tool, one purpose

The first server should do exactly one thing. Not three things. Not a "useful" thing. One thing, implemented correctly, with proper error handling and a clear description that the model can read and understand. Good first-server candidates: return the current time in a specified timezone; fetch the contents of a URL and return clean text; read a file from a specified path and return its contents.

The docstring is not documentation for humans

The description on each tool function is what the model reads to decide whether and how to use the tool. Write it for the model. Be explicit about what the tool does, what inputs it expects, and what it returns. Vague descriptions produce incorrect tool calls.

Second server: connect to something real

After the first server works, build a second one that connects to a real external system — a REST API, a database, a file system. This is where the design decisions become non-trivial. What do you expose and what do you hide? What errors does the external system produce and how do you surface them? What does the model need to know to use this tool correctly?

How to complete this stage

Build two servers from scratch — one trivial, one connected to something real
Test each server with Claude — not with unit tests alone
Write down every design decision you made and why

Exit criterion: You have shipped two working MCP servers. You can explain the design decisions in each one — not just what they do, but why they are structured the way they are.

Stage 4

Context Design

The model can only work with what it can see.

The context window is the design surface. What goes in it, when, and in what form determines what the model can do. This is not configuration — it is architecture. Context design is the discipline of managing that surface deliberately.

What belongs in context

Three categories of content compete for context space: instructions (system prompts, tool descriptions, task definitions); state (conversation history, tool results, retrieved documents); and data (content the model must reason over to complete the task). In a long agentic task, state accumulates. Old tool results that are no longer relevant still occupy space. At some point, the context fills. Systems that do not manage this fail in ways that are difficult to diagnose because the model does not announce that it has forgotten something.

RAG as context management

Retrieval-Augmented Generation is, at its core, a context management strategy. Instead of putting an entire knowledge base in context (impossible) or relying on training data (stale), you retrieve only the relevant fragments and place those in context. The retrieval mechanism is separate from the model and requires its own design — and the choice of mechanism is a substantive decision, not a configuration detail.

Three retrieval approaches are in common use, and they are not interchangeable. Vector search encodes documents and queries as dense numerical embeddings and returns semantically similar matches — effective when the vocabulary of the query differs from the vocabulary of the documents but the meaning aligns. It requires an embedding model, an index, and a choice of similarity metric. Keyword search (typically BM25 or a variant) matches on terms — faster, more interpretable, and more precise when query terms are likely to appear verbatim in the documents. Production systems often run both in parallel and merge the results; hybrid retrieval is more robust than either alone. Graph traversal follows explicit relationships between entities — the right approach when the relevant context is not a document but a chain of connections: who owns what, which component depends on which, what event preceded which decision. When relationships are first-class, graph traversal retrieves what vector and keyword search cannot.

The retrieval mechanism is one part of a retrieval system. The system also includes chunking strategy — how documents are divided before indexing — embedding model selection, index design, and the scoring and deduplication logic that determines what enters the context window when results compete for space. Each is a design decision. None is automatic. Errors in retrieval architecture show up as model errors, which makes them difficult to diagnose without understanding what the model was actually given to reason over.

The handoff problem

Between sessions, the context window is empty. Nothing persists. If a system needs continuity across sessions — and most production systems do — that continuity must be engineered explicitly: written to external storage at the end of a session, retrieved and placed in context at the start of the next one. This is not automatic. It is a design decision that must be made and implemented.

How to complete this stage

Map the context window of a real system you use or build: what is in it, where does it come from, how much space does it consume
Identify one place where context is being wasted on information that is no longer needed
Design a handoff mechanism for one system — what gets written at the end of a session, what gets read at the start

Exit criterion: You can draw the context window of a system you have built or worked with, label every element, and explain what would happen if any element were removed.

Stage 5

Agents

Models in loops with goals.

An agent is a model running in a loop. Each iteration: observe the current state, decide the next action, execute the action via tool call, observe the result, repeat. The loop terminates when the goal is achieved or a stopping condition is met. Agents are powerful and they fail in specific, predictable ways — and nearly all of those failure modes are, at bottom, context failures: information that was absent, that accumulated until it crowded out what mattered, or that was retrieved at the wrong moment. Stage 4 is not background for this stage. It is the mechanism of it.

How agents fail

Context exhaustion — the loop runs long enough that old context crowds out new information
Tool failure propagation — one tool failure causes the model to take incorrect compensating actions
Goal drift — the model pursues a proxy for the goal rather than the goal itself
Underspecified stopping conditions — the agent does not know when it is done
Overconfident action — the model takes an irreversible action without verification

Human in the loop

For most production systems, fully autonomous agents are not appropriate. The cost of an error is too high. The right pattern is a spectrum: from the model that drafts and the human that approves, through the model that acts autonomously on low-risk decisions and asks for confirmation on high-risk ones, to the fully autonomous agent that operates within a narrowly constrained domain where errors are cheap and reversible. The decision about where on this spectrum a system sits is a product decision, not a technical one.

When not to use an agent

Single-shot interactions do not need agents. Retrieval tasks where the retrieval logic is deterministic do not need agents. Agents add complexity, cost, and failure surface. Use them when the task genuinely requires sequential decision-making across multiple uncertain steps. Not before.

How to complete this stage

Build one minimal agent — a model in a loop with two or three tools and a defined stopping condition
Deliberately trigger each failure mode listed above and observe what happens
Write a one-page decision framework for when to use an agent vs a single-shot call in your specific work context

Exit criterion: You can describe five agent failure modes without referring to notes, and you have a written rationale for the level of autonomy appropriate to at least one system in your work.

Stage 6

Toward a Pattern Language

Naming what works, at every scale.

Christopher Alexander observed that good architecture is not the result of following rules. It is the result of applying patterns — solutions to recurring problems in specific contexts — at every scale simultaneously, from the region to the room to the doorknob. The patterns form a language. You compose them. Each pattern you apply creates the conditions in which other patterns can be applied.

The same structure applies to AI systems. There are recurring problems at every scale — organizational, session, prompt, tool, data. Each has proven solutions. Those solutions compose. The practitioner who can name them, apply them deliberately, and teach them to others is doing the work Alexander described.

Scales at which patterns operate

Organizational — how teams structure their relationship with AI over time
Session — how a single work session is shaped; what goes in context at the start; what is captured at the end
Prompt — patterns for effective instruction; how constraints are stated; how ambiguity is resolved before the model is called
Tool and server — patterns for MCP server design; separation of concerns; error surface management
Data and context — patterns for what lives in the context window; what is retrieved vs stored

How to complete this stage

Write five patterns from your own practice in Problem / Solution / Context form
Share them with one colleague and have them attempt to apply one — observe what is missing from the description
Revise based on what you observe
Read Alexander's A Pattern Language — not for the architecture, but for the method

Exit criterion: You have five named patterns. Each has been tested by someone other than you. At least one has been revised based on that test. You are building a language, not a list.

Reference Vocabulary

These definitions are the vocabulary of this sequence. They are not exhaustive. They are precise. Use them as written until you have a reason to refine them.

Agent

A model running in a loop with tool access and a stopping condition. Not every AI interaction is agentic.

Context window

Everything the model can see in a single inference call. The model has no memory outside this window.

Grounding

Connecting model output to verifiable external facts via retrieval or tool use.

LLM

Large Language Model. The statistical model trained on text. The engine, not the product.

MCP

Model Context Protocol. A protocol specification for AI client-to-server communication. Not a platform. Not a framework.

MCP Server

A process that exposes tools, resources, or prompt templates via the MCP protocol.

Prompt

The input to the model. Includes system prompt and human turn.

RAG

Retrieval-Augmented Generation. A context management pattern in which relevant documents are fetched and placed in context before inference.

Temperature

A sampling parameter that controls output variability. Higher: more varied. Lower: more consistent.

Token

The unit of text the model processes. Roughly, a word fragment. The model predicts tokens, not words.

Tool use

The model capability to output structured tool call requests. Predates MCP. Can be implemented without MCP.

Published on rebraining.org. This document may be shared freely. Revise it as the field develops.

Continue: A Working Pattern Vocabulary for Semantic AI Systems →