AI Agent Glossary: Every Term Explained
Core Concepts
AI Agent: A software system that autonomously pursues goals by perceiving its environment, reasoning about actions, executing those actions through external tools, and evaluating results. Distinguished from simpler AI by its autonomy, tool use, and goal-directed behavior.
Foundation Model: A large AI model (like Claude, GPT, or Gemini) trained on broad data that serves as the reasoning engine for an agent. The model provides language understanding, generation, and reasoning capabilities that the agent relies on for all decision-making.
Large Language Model (LLM): A specific type of foundation model trained primarily on text data. LLMs power most current AI agents because they can understand natural language instructions, reason about complex situations, and generate structured outputs like tool calls and plans.
Context Window: The maximum amount of text (measured in tokens) that a language model can process at once. Larger context windows allow agents to consider more information when making decisions. Current production models range from 128,000 to over 1 million tokens.
Token: The basic unit of text that language models process. Roughly equivalent to 3/4 of a word in English. Input tokens (what the model reads) and output tokens (what it generates) are typically priced differently by API providers.
Tool Use and Integration
Tool Calling (Function Calling): The mechanism by which an agent invokes external functionality during its reasoning process. The model generates a structured request specifying which tool to use and what parameters to pass, the runtime executes the tool, and the result feeds back into the model's context.
Model Context Protocol (MCP): An open standard introduced by Anthropic that defines a universal interface between AI agents and external tools. MCP servers expose capabilities (tools, resources, prompts) in a standardized format that any compatible agent can discover and use.
API (Application Programming Interface): A structured interface that allows software systems to communicate with each other. Agents use APIs to access external services, databases, and tools.
Memory and Knowledge
Retrieval-Augmented Generation (RAG): A technique that extends agent knowledge by searching a document index for relevant information and incorporating it into the model's context before generating a response. RAG lets agents access specific, up-to-date, or proprietary information not contained in the model's training data.
Vector Database: A database optimized for storing and searching high-dimensional vectors (numerical representations of text). Used in RAG systems to find documents semantically similar to a query, even when they do not share exact keywords.
Embedding: A numerical representation of text that captures its semantic meaning. Similar texts produce similar embeddings, enabling semantic search in vector databases.
Short-term Memory: Information the agent retains during a single session, typically the conversation history and accumulated context. Limited by the model's context window.
Long-term Memory: Information persisted across sessions, allowing the agent to recall past interactions, learned preferences, and accumulated knowledge. Implemented through databases, file systems, or specialized memory services.
Architecture and Design
Agentic AI: AI systems designed to operate autonomously, make decisions, and take actions without continuous human direction. Contrasted with passive AI that only responds to direct queries.
Multi-Agent System: An architecture where multiple specialized agents coordinate to accomplish complex tasks. Each agent handles a specific aspect of the workflow (research, writing, review), and an orchestrator manages task assignment and communication between them.
Orchestration: The logic that controls how an agent plans, sequences actions, handles errors, and coordinates with other agents or human operators.
Human-in-the-Loop (HITL): A design pattern where certain agent actions require human approval before execution. Used for high-stakes decisions, sensitive operations, or situations where agent confidence is low.
Prompt Injection: A security attack where malicious input manipulates an agent into performing unauthorized actions by embedding instructions that override the agent's original directives.
Frameworks and Platforms
LangChain / LangGraph: The most widely adopted open-source framework for building AI agents, with explicit state management and support for complex control flows. 34.5 million monthly downloads.
CrewAI: An open-source multi-agent framework that uses role-based abstractions to define teams of collaborating agents.
AutoGen: Microsoft's open-source framework for multi-agent systems with tight Azure integration.
Constitutional AI: An alignment approach developed by Anthropic that trains models to follow a set of principles (a "constitution"), producing outputs that are helpful, harmless, and honest without relying solely on human feedback for each scenario.
Evaluation and Benchmarking Terms
SWE-bench: A benchmark that evaluates AI agents on their ability to resolve real GitHub issues from open-source Python projects. SWE-bench Verified is the curated version with human-validated test cases. Scores above 50% are considered strong, and the current state-of-the-art exceeds 80%.
MMLU (Massive Multitask Language Understanding): A benchmark measuring model knowledge and reasoning across 57 academic subjects. While not agent-specific, MMLU scores correlate with agent reasoning quality because agents rely on their model's general knowledge and analytical capabilities.
Token Efficiency: The ratio of useful work accomplished to tokens consumed. More efficient agents complete tasks using fewer model invocations and shorter prompts, reducing both cost and latency. Claude Code achieves 5.5x better token efficiency than competing coding agents on identical benchmarks.
Hallucination Rate: The frequency at which an agent generates factually incorrect information presented as fact. Lower hallucination rates indicate more reliable agents, and the rate varies significantly by task domain, model, and grounding strategy.
Deployment and Operations Terms
Sandboxing: Running an agent in an isolated environment that restricts its access to the broader system. Sandboxed agents cannot access files, network resources, or processes outside their designated environment, limiting the impact of errors or security breaches.
Rate Limiting: Controlling how frequently an agent can call external services to prevent overwhelming those services or incurring excessive costs. Rate limits can be imposed per-tool, per-agent, or per-organization.
Circuit Breaker: A pattern that automatically disables a tool or service when its failure rate exceeds a threshold, preventing cascading failures. When the circuit breaker trips, the agent receives an error indicating the service is temporarily unavailable and must use alternative approaches.
Drift Detection: Monitoring agent performance over time to identify gradual degradation. Model updates, data distribution changes, and evolving user expectations can all cause agent performance to drift from its initial baseline. Continuous evaluation against benchmark tasks catches drift before it affects users.
Blue-Green Deployment: Running two versions of an agent system simultaneously (blue for current production, green for the new version) and gradually shifting traffic from blue to green. This allows rolling back instantly if the new version performs poorly.
Guardrails: Constraints placed on agent behavior to prevent undesirable actions. Guardrails can be implemented at the model level (constitutional AI), the framework level (tool permission systems), or the application level (output validation rules). Effective agent deployments use guardrails at multiple levels simultaneously.
Industry and Business Terms
Agentic Workflow: A business process that incorporates one or more AI agents as participants alongside human workers. The term emphasizes that agents are embedded in broader workflows rather than operating in isolation.
Agent-as-a-Service (AaaS): A deployment model where agent capabilities are offered as a cloud service, similar to SaaS. Users access agent functionality through APIs or web interfaces without managing the underlying infrastructure.
Token Budget: The maximum number of tokens (both input and output) allocated to an agent for completing a single task. Token budgets prevent runaway costs by capping the computational resources any individual task can consume.
Agent Handoff: The process of transferring a task or conversation from one agent to another, or from an agent to a human operator. Effective handoffs preserve context so the receiving party does not need to start from scratch.
Evaluation Suite: A curated set of test tasks with known-correct outcomes used to measure agent performance. Evaluation suites are the agent equivalent of test suites in traditional software, providing repeatable quality metrics.
System Prompt: The initial set of instructions given to an agent that defines its identity, capabilities, constraints, and behavioral guidelines. The system prompt is the primary mechanism for controlling agent behavior and is typically not visible to end users.
Temperature: A parameter controlling the randomness of language model outputs. Lower temperature produces more deterministic, focused responses. Higher temperature produces more varied, creative responses. Agent applications typically use lower temperatures for reliability, while creative applications use higher temperatures for diversity.
Understanding agent terminology is essential for evaluating platforms, communicating with technical teams, and making informed decisions about agent adoption. The most important terms to know are foundation model, tool calling, MCP, RAG, multi-agent systems, and human-in-the-loop.