Tool Use and Function Calling in AutoGen
Function calling is the mechanism that transforms agents from conversational text generators into practical automation tools. Without function calling, agents can only discuss what should be done. With function calling, agents can actually do it: query databases, call APIs, read files, send emails, execute code, and interact with any system that has a Python interface. Understanding how to create, register, and manage tools is essential for building useful agent systems.
Create a Tool Function
A tool function is a standard Python function with three essential elements: type hints on all parameters, a descriptive docstring, and a return value. The type hints tell the LLM what data types each parameter expects, enabling it to generate correctly formatted arguments. The docstring describes what the function does, when it should be used, and what it returns, which the LLM reads to decide whether to invoke the tool for a given task.
Write docstrings that are clear and specific from the LLM's perspective. Instead of "Gets weather data," write "Returns the current temperature, humidity, and conditions for a specified city. Use this when the user asks about current weather in a specific location." The more specific the description, the more accurately the LLM will decide when to call the function and what arguments to provide.
Keep tool functions focused on a single responsibility. A function that queries a database, processes the results, and formats them for display is doing three things and should be split into separate tools. Smaller, focused tools give the LLM more flexibility to compose them in different ways based on the task requirements. Return structured data (dictionaries or dataclasses) rather than pre-formatted strings so the agent can interpret and present the results appropriately for the context.
Register the Tool with Agents
AutoGen uses a two-part registration pattern. The register_for_llm decorator (or method call) tells the assistant agent that the tool exists and provides the schema that the LLM uses to generate tool calls. The register_for_execution decorator (or method call) tells the user proxy agent how to execute the tool when the assistant requests it. Both registrations are required for a tool to function.
This split between LLM awareness and execution capability reflects AutoGen's architecture where the assistant generates tool call requests and the user proxy executes them. The assistant never executes tools directly, which provides a natural approval boundary. In configurations where the user proxy requires human approval, the user sees the tool call before it executes and can approve, modify, or reject it.
When registering tools, you can customize the name and description that the LLM sees independently from the Python function name and docstring. This is useful when the same function should appear differently to different agents, or when the Python naming conventions do not produce clear tool names for the LLM. Keep tool names short and descriptive: "search_products" is better than "query_database_for_product_records_matching_criteria."
Configure Tool Behavior
Error handling in tools is important because tool failures during a conversation can confuse the agent or cause the conversation to stall. Wrap tool implementations in try-except blocks and return descriptive error messages rather than raising exceptions. When a tool returns an error message like "Database connection failed, please try again," the agent can adapt its approach. When a tool raises an unhandled exception, the conversation may terminate unexpectedly.
Configure timeout limits for tools that call external services. A tool that waits indefinitely for a slow API response blocks the entire conversation. Set reasonable timeouts and return informative messages when timeouts occur. For tools that perform long-running operations, consider returning a status indicator that the agent can check in subsequent turns rather than blocking the conversation.
Be thoughtful about what data tools return. Returning excessively large datasets consumes context window space and increases token costs for all subsequent messages. If a database query returns 10,000 rows, summarize or paginate the results rather than returning the full dataset. The agent does not need all the raw data; it needs enough information to reason about the results and present relevant findings to the user.
Test Tool Integration
Test tools in isolation first, then test them within agent conversations. Unit test the tool function directly with representative inputs to verify it produces correct outputs. Mock external dependencies (databases, APIs, file systems) to make tests fast and deterministic. Verify that the function handles edge cases: missing parameters, invalid input types, network errors, and empty results.
Integration testing within an agent conversation verifies that the LLM correctly identifies when to call the tool, generates valid arguments, and interprets the results appropriately. Run several conversations that require the tool with different phrasings of the same request to check that the LLM consistently selects the tool. Inspect the generated arguments to ensure they match expectations.
Common issues include the LLM not recognizing when to call the tool (improve the docstring description), the LLM generating incorrect argument types (add more specific type hints and parameter descriptions), and the LLM misinterpreting tool results (make return values more explicit with labeled fields). Adjust tool metadata based on test results until the tool is reliably invoked and used correctly.
Organize Tools into Plugins
As the number of tools grows, organizing them into Semantic Kernel plugin classes improves maintainability and reusability. A plugin class groups related tool functions under a common namespace, shares configuration and dependencies through the class constructor, and provides a clean interface for registration and testing.
In the Semantic Kernel pattern, each method in a plugin class decorated with the KernelFunction attribute becomes an available tool. The class constructor can accept dependencies like database connections, API clients, or configuration objects through dependency injection. This pattern makes plugins testable (inject mock dependencies) and portable (the same plugin works in any agent system that uses the kernel).
Plugin organization follows natural domain boundaries. A CRM plugin might expose functions for searching contacts, creating leads, and updating opportunities. A document plugin might offer functions for reading files, extracting text, and generating summaries. Keep plugins focused on a single domain, and compose them at the agent level by registering multiple plugins with the kernel. This modular approach makes it easy to share tools across projects and teams.
Tool use in AutoGen connects agents to external systems through Python functions registered with descriptive metadata. Create focused functions with clear docstrings, register them with both the LLM and execution agents, handle errors gracefully, test thoroughly in both isolation and conversation context, and organize growing tool collections into Semantic Kernel plugins for maintainability.