How to Build an Agent with Claude SDK

Updated May 2026
This guide walks you through building a functional AI agent using the Claude Agent SDK, from initial project setup through production deployment. You will create an agent that can read files, execute commands, connect to external services through MCP, maintain persistent sessions, and report on its actions through lifecycle hooks. By the end, you will have a production-ready agent architecture that you can adapt for your specific use case.

The Claude Agent SDK provides the fastest path from zero to a working agent because its built-in tools handle the most common operations out of the box. This tutorial focuses on Python, though the same concepts apply to the TypeScript SDK with syntax differences.

Step 1: Set Up Your Project

Start by creating a new Python project and installing the SDK. The Claude Agent SDK is distributed through PyPI as part of the anthropic package. Create a virtual environment, install the package with pip, and set your Anthropic API key as an environment variable. The SDK reads the ANTHROPIC_API_KEY environment variable automatically, so no explicit key configuration is needed in your code.

Create a project directory with a main script file and a configuration directory for MCP server definitions. The configuration directory is optional but recommended for organizing MCP connections as your agent grows more capable.

Step 2: Create Your First Agent

The minimal agent requires just a few lines of code. Import the SDK's agent module, create an agent instance with a system prompt that describes the agent's role, and send it a task. The SDK handles the agent loop automatically: it sends your task to Claude, routes any tool calls the model makes, passes the results back to Claude, and repeats until the task is complete.

The system prompt is the most important configuration for agent quality. Write clear, specific instructions that define the agent's role, its capabilities, its constraints, and the quality standards it should meet. A well-written system prompt dramatically improves agent reliability compared to a generic one.

At this point, your agent already has access to built-in tools for reading files, writing files, editing code, running shell commands, and searching codebases. Test it with a simple task like "read the README.md file and summarize it" or "list all Python files in this directory."

Step 3: Add MCP Tool Servers

MCP servers extend your agent's capabilities beyond built-in tools. To add an MCP server, specify the server command and transport in your configuration. For example, adding the GitHub MCP server lets your agent create pull requests, read repository contents, and manage issues. Adding the Playwright MCP server enables browser automation.

Each MCP server is defined with a command (the executable to run), arguments (command-line options), and optionally environment variables (for API keys and configuration). The SDK starts the server process, connects via the specified transport, discovers available tools, and makes them available to the model alongside built-in tools.

Start with one or two MCP servers that match your use case. Test each integration individually before combining them. This makes it easier to diagnose issues if a tool call fails or returns unexpected results.

Step 4: Implement Session Persistence

For tasks that span multiple interactions, session persistence lets you pause and resume the agent without losing context. The SDK assigns a session ID when you create a new session. Store this ID (in a database, file, or environment variable) and pass it when resuming the session.

Session persistence is valuable for interactive applications where a user asks follow-up questions, long-running automation tasks that run in batches, and agent systems that need to checkpoint their progress. The SDK handles context management automatically, compacting older conversation turns when the context window fills up.

Test session persistence by starting a task, stopping the script, and resuming with the saved session ID. Verify that the agent remembers what it did in the previous session and can continue where it left off.

Step 5: Add Lifecycle Hooks

Hooks let you observe and control the agent's behavior at every execution point. Register callback functions for the events you care about. Common hooks include tool_call_start (log every tool call for audit purposes), tool_call_end (track tool execution timing), model_response (monitor token usage and costs), and error (handle failures gracefully).

For production systems, implement at minimum: a logging hook that records all tool calls and model responses, a cost tracking hook that accumulates token usage, and an error hook that alerts your monitoring system when the agent encounters failures. These hooks provide the observability needed to operate agents reliably in production.

For sensitive operations, implement approval hooks that pause execution and require human confirmation before proceeding. This is essential for agents that modify production data, send communications, or make financial transactions.

Step 6: Deploy to Production

Production deployment requires several additional considerations beyond the development setup. Configure error handling to catch and recover from API failures, rate limits, and network issues. The SDK provides retry logic for transient errors, but you should implement application-level recovery for persistent failures.

Set up monitoring that tracks agent success rates, average token costs, execution times, and error frequencies. Use the lifecycle hooks to feed this data into your existing monitoring infrastructure (Datadog, Prometheus, CloudWatch, or similar).

Configure resource limits to prevent runaway agent loops. Set maximum turn limits (how many tool calls before the agent must stop), token budgets (maximum tokens per session), and time limits (maximum wall-clock time per task). These limits protect against edge cases where the agent gets stuck in a loop or generates excessive output.

Package your agent as a Docker container or serverless function for consistent deployment. Use environment variables for all configuration (API keys, MCP server paths, resource limits) to keep your deployment portable across environments.

Next Steps After Your First Agent

Once your basic agent is working, there are several directions to expand. Multi-agent coordination lets you create specialized agents that handle different aspects of a complex task. A research agent can gather information, a coding agent can implement changes, and a review agent can check quality. The Claude SDK supports swarm patterns where these agents coordinate automatically through shared context.

Custom tool development lets you extend beyond built-in tools and MCP servers. Define Python functions that perform domain-specific operations, wrap them with the SDK's tool interface, and make them available to the agent. Custom tools are the mechanism for integrating your agent with your organization's specific APIs, databases, and workflows.

Fine-tuning the system prompt is an ongoing process. As you observe your agent in production, you will identify patterns where it makes suboptimal decisions. Refine the system prompt to address these patterns specifically. The most effective system prompts evolve over time based on real usage data rather than being written once and left unchanged. Track the common failure modes and add explicit guidance for handling them.

Cost optimization is critical for production agents. Enable prompt caching to reduce repeated input token costs by 90%. Use the Batch API for non-interactive workloads to save an additional 50%. Monitor token usage through hooks and set alerts for tasks that exceed expected costs. Consider using Haiku for simple agent steps and reserving Sonnet or Opus for complex reasoning, routing different tasks to different model tiers based on their complexity requirements.

Key Takeaway

The Claude Agent SDK gets you from zero to a working agent faster than any alternative thanks to built-in tools, but production readiness requires adding hooks for observability, session persistence for reliability, and resource limits for safety.