AutoGen and Microsoft Agent Framework: Complete Guide

Updated May 2026
AutoGen is Microsoft's open-source framework for building multi-agent AI systems through natural language conversations between autonomous agents. Originally launched as a research project in 2023, AutoGen introduced a conversational paradigm where multiple AI agents collaborate by exchanging messages, executing code, and coordinating complex tasks without rigid workflow definitions. In 2026, Microsoft unified AutoGen with Semantic Kernel into the production-ready Microsoft Agent Framework, creating a single SDK for building enterprise-grade agent systems in both Python and .NET.

What Is AutoGen

AutoGen is a Python framework developed by Microsoft Research that enables developers to build applications where multiple AI agents work together through structured conversations. Unlike traditional single-agent systems where one language model handles all tasks sequentially, AutoGen creates a team of specialized agents that communicate with each other, debate solutions, write and execute code, and arrive at results collaboratively.

The framework emerged from research published in August 2023 and quickly gained traction in the open-source community, accumulating over 54,000 GitHub stars by early 2026. Its core insight was that complex AI tasks could be decomposed into conversations between agents with different roles, much like how a team of human experts would tackle a difficult problem by discussing it from multiple angles.

AutoGen supports multiple LLM providers including OpenAI GPT models, Azure OpenAI, and open-source models through compatible APIs. Each agent in a system can use a different model, allowing developers to assign expensive frontier models to critical reasoning tasks while using smaller, cheaper models for routine operations. This multi-model flexibility makes AutoGen practical for real-world applications where cost management matters as much as capability.

In October 2025, Microsoft announced that AutoGen would enter maintenance mode as its team merged with the Semantic Kernel team to build the unified Microsoft Agent Framework. AutoGen continues to receive critical bug fixes and security patches, but significant new features are now developed exclusively in the Agent Framework. Existing AutoGen projects remain functional, and Microsoft provides a detailed migration guide for teams ready to transition.

Core Architecture and Design Philosophy

AutoGen's architecture centers on three fundamental concepts: conversable agents, conversation patterns, and enhanced LLM inference. Every component in the system is an agent that can send and receive messages, and the framework provides infrastructure for managing these conversations at scale.

The ConversableAgent base class defines the interface that all agents implement. Each agent has a name, a system message that defines its behavior, and configurable capabilities for LLM inference, code execution, and human interaction. Agents process incoming messages through a pipeline that can include LLM calls, function execution, or direct human input, then generate responses that continue the conversation.

The AssistantAgent is preconfigured as an AI helper that uses an LLM to generate responses. It excels at writing code, analyzing data, providing explanations, and reasoning through complex problems. Developers customize its behavior through detailed system messages that define its role, constraints, and output format preferences.

The UserProxyAgent represents the human in the loop. It can execute code written by other agents in sandboxed environments, relay human feedback into the conversation, and act as a gateway between the agent system and external tools. The proxy can operate fully autonomously, with human approval required for each action, or in a hybrid mode where only certain actions require confirmation.

Conversation patterns in AutoGen range from simple two-agent dialogues to complex group chats with dynamic speaker selection. In a two-agent setup, an AssistantAgent and UserProxyAgent take turns exchanging messages until a termination condition is met. Group chats extend this to multiple agents using either round-robin turn-taking, a manager agent that selects the next speaker based on conversation context, or custom selection functions that implement domain-specific routing logic.

The enhanced LLM inference layer handles API interactions with features like caching, rate limiting, error handling, and cost tracking. It supports multiple model configurations per agent with automatic fallback, so if one model provider is unavailable or rate-limited, the agent seamlessly switches to an alternative. This resilience is essential for production systems that cannot tolerate downtime from a single provider outage.

How Multi-Agent Conversations Work

Multi-agent conversations in AutoGen follow a message-passing paradigm where the conversation history itself serves as the shared state. There is no separate state management layer or external database tracking workflow progress. Instead, every agent can read the full conversation transcript and use it to inform its next action.

A typical multi-agent workflow begins when a user sends a task description to the system. The UserProxyAgent receives this input and forwards it to an AssistantAgent, which analyzes the request and generates a plan. If the plan involves code, the AssistantAgent writes Python code and sends it back to the UserProxyAgent for execution. The proxy runs the code in a sandboxed environment, captures the output, and returns it to the AssistantAgent for analysis. This cycle continues until the task is complete or a maximum number of turns is reached.

Group chats enable more sophisticated coordination. In a data analysis scenario, a Planning Agent might decompose a task into sub-tasks, a Coding Agent writes the implementation, a Review Agent checks the code for errors, and an Analysis Agent interprets the results. The GroupChatManager selects which agent speaks next based on the conversation state, ensuring that the right specialist handles each phase of the work.

Dynamic conversation topologies allow the agent graph to change during execution. An agent might spawn new sub-conversations, delegate tasks to specialized agent pairs, or escalate decisions to human operators based on confidence thresholds. This adaptability distinguishes AutoGen from more rigid workflow engines where the execution graph must be defined entirely in advance.

Termination conditions control when conversations end. Built-in options include maximum turn counts, detection of specific phrases like "TERMINATE" in agent messages, function-based checks that evaluate task completion criteria, and timeout limits. Developers can combine multiple conditions so that conversations end when the task is done or when resource limits are reached, whichever comes first.

Code Execution and Sandboxing

Code execution is a first-class capability in AutoGen, not an afterthought bolted onto a chat interface. When an AssistantAgent generates Python code, the UserProxyAgent can automatically extract and execute it in a sandboxed environment, capture the output (including errors, print statements, and return values), and feed that output back into the conversation for the assistant to analyze.

AutoGen supports multiple execution backends. The default local executor runs code in a subprocess with configurable timeouts and working directories. The Docker executor launches code inside isolated containers, providing stronger security boundaries for untrusted code. For cloud deployments, Azure Container Instances can serve as remote execution environments with managed scaling and resource limits.

The iterative debugging loop is where code execution becomes genuinely powerful. When code fails with an error, the error traceback flows back to the AssistantAgent, which reads the error message, diagnoses the problem, generates corrected code, and sends it back for another execution attempt. This cycle typically resolves common issues like import errors, type mismatches, and logic bugs within two to three iterations, mimicking how a human developer would debug interactively.

Security controls govern what executed code can access. Developers configure allowed directories, network access policies, maximum execution time, and memory limits. The Docker executor provides filesystem isolation, preventing code from reading or modifying files outside its container. For sensitive environments, code can be restricted to a pre-approved set of packages and prevented from making external network calls.

The Microsoft Agent Framework

The Microsoft Agent Framework (MAF) is the production-ready successor that unifies AutoGen's conversational agent patterns with Semantic Kernel's enterprise features. Released as version 1.0 GA in April 2026, it provides stable APIs with long-term support commitments for both Python and .NET developers.

MAF preserves AutoGen's core strengths, including multi-agent conversations, code execution, and human-in-the-loop patterns, while adding capabilities that enterprise deployments demand. Session-based state management allows agent conversations to persist across service restarts. Built-in telemetry integrates with OpenTelemetry for observability. Middleware pipelines enable cross-cutting concerns like logging, authentication, and rate limiting without modifying agent logic.

The framework introduces graph-based workflows alongside the conversational model. Developers can define explicit agent interaction patterns as directed graphs where nodes represent agents or processing steps and edges define the flow of messages between them. This hybrid approach lets teams use conversational flexibility where it adds value and structured workflows where predictability matters.

Cross-runtime interoperability through the Agent-to-Agent (A2A) protocol and Model Context Protocol (MCP) allows MAF agents to communicate with agents built on other frameworks. A Python MAF agent can coordinate with a .NET MAF agent, or even with agents running on third-party platforms that implement the A2A specification. This interoperability is critical for large organizations with polyglot technology stacks.

Multi-provider model support means MAF agents can use any LLM that exposes a compatible API, including OpenAI, Azure OpenAI, Anthropic Claude, Google Gemini, and self-hosted open-source models through inference servers like vLLM and Ollama. The framework abstracts provider differences behind a unified interface, making it straightforward to switch between models without changing agent logic.

Semantic Kernel Integration

Semantic Kernel serves as the foundation layer within the Microsoft Agent Framework, providing the plugin system, memory management, and model abstraction that agents build upon. Understanding this integration is important for developers migrating from either AutoGen or Semantic Kernel to the unified framework.

Semantic Kernel's plugin architecture lets developers expose native functions, API endpoints, and external services as callable tools that agents can use. A plugin is simply a collection of functions with descriptive metadata that the LLM uses to decide when and how to call them. This approach makes it straightforward to give agents access to databases, REST APIs, file systems, and custom business logic.

The memory system from Semantic Kernel provides agents with persistent context beyond the immediate conversation. Vector stores enable semantic search over large document collections, allowing agents to retrieve relevant information before generating responses. This retrieval-augmented generation (RAG) pattern is built into the framework rather than requiring external tooling.

For .NET developers, the integration means they can build agent systems using familiar C# patterns with strong typing, dependency injection, and the full .NET ecosystem. Agents defined in .NET interoperate seamlessly with Python agents through the A2A protocol, so teams can use whichever language fits their expertise and requirements for each component.

The planning capabilities from Semantic Kernel enable agents to decompose complex goals into sequences of plugin calls. The Handlebars planner and Stepwise planner generate execution plans that chain multiple functions together, with the agent monitoring progress and adjusting the plan based on intermediate results. This structured planning complements AutoGen's more freeform conversational approach.

Azure AI Services Integration

AutoGen and the Microsoft Agent Framework integrate deeply with Azure AI services, providing managed infrastructure for teams that want to deploy agent systems without managing their own compute and model hosting. Azure AI Foundry serves as the primary deployment platform, offering model hosting, agent runtime, and monitoring in a single managed environment.

Azure OpenAI Service provides access to GPT-4o, GPT-4.1, and other frontier models with enterprise features like content filtering, private networking, and data residency controls. Agents running on Azure can use these models through the same API interface as direct OpenAI access, with additional capabilities like provisioned throughput for predictable latency and managed fine-tuning for domain-specific models.

The Foundry Agent Service hosts agents as managed endpoints with automatic scaling, built-in monitoring, and integration with Azure identity and access management. Developers deploy agent code to the service and it handles infrastructure provisioning, load balancing, and failover. This managed approach reduces operational burden for teams that want to focus on agent logic rather than infrastructure.

Azure AI Search provides the vector store backend for RAG-enabled agents. Documents are indexed with both keyword and semantic embeddings, enabling hybrid search that combines exact matching with conceptual similarity. Agents query this index to ground their responses in specific documents, reducing hallucination and improving factual accuracy for domain-specific applications.

For organizations with existing Azure investments, the integration extends to Azure Monitor for centralized logging, Azure Key Vault for secrets management, and Azure Active Directory for authentication. These integrations mean that agent systems can conform to the same governance and compliance standards as other enterprise applications in the organization's cloud environment.

Agent Types and Customization

AutoGen provides several built-in agent types that cover common use cases, along with extension points for creating custom agents with specialized behaviors. Understanding these types helps developers choose the right starting point for their applications.

The AssistantAgent is the most commonly used type, configured with an LLM and a system message that defines its expertise and behavior. A single AutoGen application might include multiple AssistantAgents with different specializations: one configured as a Python developer, another as a data analyst, and a third as a technical writer. Each uses the same underlying model but produces different outputs based on its system message.

The UserProxyAgent bridges the gap between agents and the external world. Beyond human interaction, it handles code execution, file operations, and tool invocations. Its configuration determines the level of human oversight: fully automatic execution for trusted workflows, approval-required for sensitive operations, or always-human for critical decisions.

The GroupChatManager orchestrates conversations between three or more agents. It maintains the shared conversation history, selects the next speaker based on configurable strategies, and enforces conversation rules like maximum turns and termination conditions. The manager can use an LLM to intelligently select speakers based on conversation context, or follow deterministic patterns like round-robin for predictable behavior.

Custom agents extend the ConversableAgent base class with domain-specific logic. A database agent might automatically translate natural language queries into SQL. A monitoring agent might watch for specific patterns in the conversation and trigger alerts. A validation agent might check outputs against business rules before allowing them to be returned to users. The extension model is intentionally minimal, requiring developers to implement just a message-handling function.

Tool Use and Function Calling

Function calling in AutoGen allows agents to invoke external tools and services as part of their reasoning process. When an agent determines that it needs information or capabilities beyond what the LLM can provide directly, it generates a structured function call that the framework routes to the appropriate handler.

Tools are registered with agents as Python functions decorated with metadata describing their parameters, return types, and intended use. The LLM receives these descriptions as part of its context and decides when to call each tool based on the conversation state and user request. This approach leverages the native function-calling capabilities of models like GPT-4o and Claude, ensuring reliable structured output.

Common tool patterns include database queries, API calls to external services, file system operations, web searches, and mathematical computations. AutoGen handles the serialization of function arguments, execution of the function, and injection of results back into the conversation automatically. Developers focus on implementing the tool logic without worrying about the plumbing that connects it to the agent conversation.

Error handling for tool calls follows the same iterative pattern as code execution. If a function call fails due to invalid arguments, network errors, or business logic violations, the error information flows back to the agent, which can retry with corrected parameters, try an alternative approach, or ask the user for clarification. This resilience makes agent systems practical for real-world environments where external services are not always reliable.

Production Deployment Considerations

Deploying AutoGen or MAF agents to production requires attention to reliability, cost management, security, and observability that goes beyond what a prototype environment demands. The framework provides building blocks for production systems, but architectural decisions about scaling, persistence, and error recovery remain the developer's responsibility.

State persistence is critical for long-running agent workflows. The Microsoft Agent Framework provides session-based state management that survives service restarts, allowing conversations to resume exactly where they left off. For AutoGen, developers typically implement custom persistence by serializing conversation histories to databases or message queues.

Cost control requires careful attention to token usage across multi-agent conversations. Each message in a group chat is sent to the LLM with the full conversation history as context, so a conversation between five agents over twenty turns can consume millions of tokens. Strategies for managing costs include using smaller models for routine agents, summarizing long conversation histories, setting strict turn limits, and implementing caching for repeated queries.

Security in production means restricting what agents can do, what data they can access, and how they interact with external systems. Code execution must be sandboxed, tool access must be scoped to the minimum necessary permissions, and agent outputs must be validated before being acted upon. The framework provides the hooks for implementing these controls, but the specific policies depend on the application's risk profile.

Pricing and Cost Structure

AutoGen itself is free and open-source under the MIT license, with no licensing fees for commercial use. The costs of running an AutoGen-based system come entirely from the infrastructure and model APIs it consumes. The same applies to the Microsoft Agent Framework, which is also open-source.

Model API costs represent the largest expense for most deployments. Each agent call consumes input tokens (the conversation history and system message) and output tokens (the agent's response). In multi-agent systems, these costs multiply because every agent message adds to the shared context that subsequent agents must process. A five-agent group chat with a 50-turn conversation can easily consume 500,000 or more tokens per task.

Azure AI Foundry offers two pricing models for model hosting. Pay-as-you-go charges per token consumed with no upfront commitment, providing flexibility for variable workloads. Provisioned Throughput Units reserve dedicated model capacity at a fixed hourly rate, offering predictable costs and guaranteed latency for production applications with consistent demand.

For enterprise deployments using the full Microsoft stack, Agent 365 licensing adds $15 per user per month at general availability, providing managed agent services integrated with Microsoft 365 applications. This is bundled in the Microsoft 365 E7 suite at $99 per user per month for organizations that want the complete package.

Infrastructure costs for self-hosted deployments depend on compute requirements for code execution environments, storage for conversation logs and agent state, and networking for model API calls. Docker-based execution environments add container orchestration overhead. Organizations running on Azure can use the managed Foundry Agent Service to offload infrastructure management, paying only for the compute and model tokens consumed.

How AutoGen Compares to Other Frameworks

The multi-agent framework landscape includes several mature alternatives, each with distinct design philosophies and strengths. Understanding these differences helps teams choose the right tool for their specific requirements.

AutoGen vs LangGraph: LangGraph models workflows as explicit state machines with nodes and edges, giving developers fine-grained control over execution flow, state management, and error handling. AutoGen's conversational approach is more flexible for exploratory tasks but less predictable for structured workflows. LangGraph's checkpointing and time-travel debugging make it stronger for production systems that need reproducibility. In benchmark comparisons, LangGraph completes about 62% of complex tasks compared to AutoGen's 58%, primarily due to its superior error recovery in multi-step workflows.

AutoGen vs CrewAI: CrewAI uses a role-based model inspired by organizational structures, where agents are defined by their role, goal, and backstory. This makes CrewAI intuitive for business users who think in terms of team dynamics. AutoGen's conversational model offers more flexibility but requires more technical understanding to configure effectively. CrewAI has invested heavily in enterprise observability with its tracing UI, while AutoGen relies on more manual debugging approaches.

AutoGen vs OpenAI Agents SDK: The OpenAI Agents SDK provides a tightly integrated experience for teams committed to the OpenAI model ecosystem. It offers built-in tracing, guardrails, and handoff patterns optimized for GPT models. AutoGen's advantage is model provider flexibility, supporting any LLM with a compatible API. Teams locked into the OpenAI ecosystem may find the Agents SDK simpler to use, while teams that need multi-provider support or plan to use open-source models will benefit from AutoGen's abstraction layer.

Known Limitations

AutoGen's maintenance mode status means that significant new features will not be added to the framework. Teams building new projects should start with the Microsoft Agent Framework rather than AutoGen to avoid a mandatory migration later. Existing AutoGen deployments will continue to function, but the gap between AutoGen's capabilities and the evolving state of the art will widen over time.

Debugging multi-agent conversations remains challenging. When agents produce incorrect results, tracing the reasoning failure across multiple agents and dozens of messages requires patience and tooling that AutoGen does not provide natively. The Microsoft Agent Framework improves this with built-in telemetry, but complex agent interactions can still be difficult to diagnose.

Token costs can escalate quickly in group chat scenarios. The conversation history grows with each message, and every agent must process the full history to generate its response. Without careful management of context windows and conversation summarization, a complex multi-agent task can consume millions of tokens, making it impractical for cost-sensitive applications.

The framework does not include built-in guardrails for output quality. Agents can hallucinate, generate incorrect code, or produce outputs that violate business rules. Developers must implement their own validation layers, output filters, and approval workflows to ensure that agent outputs meet quality standards before being acted upon.

Migration Path Forward

Microsoft provides an official migration guide for moving from AutoGen to the Microsoft Agent Framework. The core concepts transfer directly: agents, conversations, tool use, and code execution all have equivalents in the new framework. The main differences are in configuration syntax, the addition of enterprise features like middleware and telemetry, and the replacement of custom patterns with built-in capabilities.

Migration is not urgent for stable deployments. AutoGen will continue to receive bug fixes and security patches for the foreseeable future. Teams should plan their migration based on their own timelines, prioritizing the transition when they need features that only the Agent Framework provides or when they want to align with Microsoft's long-term support roadmap.

The migration process involves updating import statements, adapting agent configuration to the new API, and replacing custom implementations of features like state persistence with the framework's built-in equivalents. For most projects, the migration is a refactoring exercise rather than a rewrite, with the core agent logic remaining largely unchanged.

Getting Started

New developers should install the Microsoft Agent Framework rather than AutoGen, as it provides the same capabilities with a more modern API and active feature development. The framework installs via pip for Python or NuGet for .NET and includes comprehensive documentation with tutorials covering common agent patterns.

For developers who want to learn the foundational concepts before committing to a framework, AutoGen's extensive documentation and community examples provide an excellent educational resource. The conversation-based model is intuitive for developers familiar with chat interfaces, and simple two-agent setups can be running within minutes.

The recommended learning path starts with a basic two-agent conversation, progresses to adding code execution and tool use, then explores group chats and custom agent types. Each step builds on the previous one, gradually introducing the complexity management strategies that production systems require.

Explore AutoGen Topics