The Tool Layer: APIs, Browser, and File Access
Function Calling: How Tools Work
Function calling is the protocol that connects language models to external tools. The process works in four steps. First, the model receives descriptions of available tools as part of its system prompt: each tool has a name, a description of what it does, and a schema defining its parameters. Second, during generation, the model can choose to emit a structured tool call instead of regular text, specifying which tool to invoke and what arguments to pass. Third, the orchestration layer intercepts this tool call, validates the arguments, executes the actual function, and captures the result. Fourth, the result is fed back to the model as a new message, and the model continues generating its response with the tool output as additional context.
This loop can repeat multiple times within a single interaction. An agent tasked with analyzing a codebase might first call a file-listing tool to see what files exist, then call a file-reading tool to examine specific files, then call a code-search tool to find related definitions. Each tool call adds information that guides the agent's next decision, building up a comprehensive understanding through iterative exploration.
Not all models support function calling equally well. Larger models (13B and above) generally follow tool schemas more reliably and produce valid JSON arguments more consistently. Smaller models sometimes generate malformed tool calls, forget available tools, or invoke tools unnecessarily. If your agents rely heavily on tools, investing in a more capable model at the LLM layer pays dividends in tool reliability.
Model Context Protocol (MCP)
MCP provides a standardized way to package and distribute AI tools as self-contained servers. An MCP server exposes one or more tools through a defined interface, and any MCP-compatible client (Claude, Cursor, custom agents) can discover and use those tools without custom integration code. The protocol handles tool discovery (what tools are available and what do they do), tool invocation (calling a tool with specific arguments), and result formatting (returning structured results to the model).
The practical benefit of MCP is a growing ecosystem of pre-built tool servers. Community-maintained MCP servers exist for file system access, GitHub operations, database queries (PostgreSQL, SQLite, MongoDB), web search (Brave, Google), web browsing (Playwright, Puppeteer), Slack integration, email access, and dozens of other services. Installing an MCP server typically means adding a few lines to your agent's configuration and restarting, rather than writing custom API integration code.
Building custom MCP servers is straightforward for developers familiar with Python, TypeScript, or other supported languages. The MCP SDK provides the protocol handling, and you implement handler functions for each tool your server exposes. A custom database tool might accept SQL queries and return results. A custom CRM tool might look up customer records. The MCP framework handles serialization, error handling, and protocol compliance.
Common Tool Categories
File system tools let agents read, write, search, and organize files on the host system or within designated directories. These are essential for coding assistants, document processors, and any agent that works with local data. Security boundaries are critical here: restrict file access to specific directories and never give agents write access to system files or configuration.
Web tools give agents the ability to search the internet and read web pages. Web search tools (SearXNG, Brave Search API, Tavily) let agents find current information beyond their training data. Web browsing tools (Playwright, Puppeteer) let agents interact with web applications, fill forms, and extract structured data from pages. These capabilities are powerful but require careful rate limiting and content filtering.
Database tools connect agents to structured data stores. A PostgreSQL tool lets agents query business data, look up records, and generate reports. A Redis tool provides access to cached data and session state. Database tools should use read-only connections by default, with write access granted only when explicitly needed and with strict input validation to prevent SQL injection or data corruption.
Code execution tools let agents write and run code to perform computations, analyze data, or test solutions. Python execution sandboxes (like E2B or local Docker containers) provide a safe environment where agents can run code without affecting the host system. This capability is essential for data analysis agents, coding assistants that need to verify their solutions, and any agent that performs computations beyond basic arithmetic.
Tool Security
Every tool represents an attack surface. An agent with file access could read sensitive credentials. An agent with web access could exfiltrate data to external servers. An agent with database access could modify or delete records. An agent with code execution could run malicious commands. Tool security is not optional, it is a core requirement for any production deployment.
Sandboxing is the first line of defense. Run tools in isolated environments (Docker containers, VM-based sandboxes, restricted user accounts) that limit what the tool process can access. A file system tool should only access designated directories. A code execution tool should run in a container with no network access and no persistent storage. Database tools should connect with read-only credentials unless write access is specifically required.
Permission scoping limits which tools each agent can access. A customer support agent needs database read access and email tools but should never have file system access or code execution. A coding assistant needs file access and code execution but should not have email or database tools. Define minimal tool sets for each agent role and resist the temptation to give every agent access to everything.
Audit logging records every tool invocation with the agent identity, tool name, arguments, timestamp, and result. This log serves both debugging (understanding why an agent produced a particular result) and security monitoring (detecting unusual patterns like unexpected file access or database queries). Store tool logs in a durable, append-only format that cannot be modified by the agents themselves.
Tool Design Principles
Well-designed tools follow several principles that maximize their usefulness to AI agents. Each tool should do one thing clearly: a tool that searches a database should not also format the results for display. Keep the input schema simple with descriptive parameter names, because the model needs to understand what each parameter does from its name and description alone. Provide detailed descriptions of what the tool does and when to use it, because models rely heavily on these descriptions to decide which tool to invoke.
Return values should be structured, concise, and directly useful to the model. A database query tool that returns raw JSON with 50 fields forces the model to parse and filter information, wasting context window space and increasing the chance of errors. A better design returns only the fields relevant to common queries, with a separate tool or parameter option for retrieving full details when needed. Think of each tool response as prompt content that the model must process, and optimize for clarity and brevity.
Error messages from tools should tell the model what went wrong and suggest what to try next. A tool that returns a generic error code gives the model no information to work with. A tool that returns a message explaining that the query failed because the specified table does not exist, and suggesting available table names, enables the model to self-correct and retry with valid input. Good tool error messages transform failures from dead ends into learning opportunities for the agent.
Test tools with the actual model you plan to use, not just with manual testing. Models interact with tools differently than humans do: they may pass unexpected argument types, call tools in unexpected order, or misinterpret ambiguous descriptions. Run a set of representative agent conversations that exercise each tool and verify that the model invokes them correctly. Fix any systematic misuse by improving the tool description or simplifying the parameter schema rather than by adding complexity.
Tools transform AI models from conversationalists into agents that can take real actions. Start with MCP for pre-built integrations, build custom tools for domain-specific operations, and treat security (sandboxing, permission scoping, audit logging) as a mandatory part of every tool deployment.