The n8n, Ollama, and Open WebUI Stack
Why This Stack Works
Each component in this stack handles a distinct responsibility without overlapping or conflicting with the others. Ollama manages model lifecycle, GPU allocation, and inference serving through a standard API on port 11434. Open WebUI provides the conversational interface for direct human interaction with models, including conversation history, model selection, and document uploads. n8n provides programmatic access to the same models through workflow automation, enabling background processing, event-driven triggers, and integration with external systems.
The key insight is that Open WebUI and n8n use Ollama independently for different purposes. Open WebUI is for interactive chat: a human sits at the keyboard and has a conversation. n8n is for automated processing: workflows trigger on events, process data through LLM calls, and take actions without human involvement. Having both in the same stack gives you both capabilities without running separate model instances. Both connect to the same Ollama endpoint, and Ollama handles request queuing automatically.
This stack runs entirely in Docker containers, shares a Docker network for internal communication, and deploys with a single docker-compose file. The total resource overhead beyond Ollama itself is modest: Open WebUI needs approximately 200 MB of RAM, and n8n needs about 300 MB. The only significant hardware requirement is the GPU for Ollama, which dictates what model sizes you can run. A machine with 16 GB of system RAM and a GPU with 8 GB of VRAM handles this stack comfortably with a 7B model.
How n8n Connects to Ollama
n8n connects to Ollama through its AI Agent node and LLM Chat Model sub-node. You configure the Ollama base URL (typically http://ollama:11434 when running in Docker), select a model, and set parameters like temperature and maximum tokens. Once configured, you can use Ollama models in any n8n workflow node that requires AI processing: the AI Agent node for tool-calling loops, the Text Classifier node for categorization, the Summarization node for content condensation, or a basic LLM Chain node for simple prompt-response interactions.
The AI Agent node is particularly powerful because it implements a full agent loop with tool calling. You connect tool nodes (database queries, HTTP requests, code execution, file operations) to the AI Agent, and it autonomously decides which tools to call based on the user's request. This means you can build sophisticated AI agents that search the web, query databases, process files, and interact with APIs, all through a visual workflow interface rather than custom code.
n8n's workflow trigger system enables event-driven AI processing. Workflows can trigger on incoming webhooks (integrating with any system that sends HTTP requests), email receipt, scheduled intervals (every hour, every day), database changes, message queue events, or file system changes. Each trigger feeds data into the workflow where Ollama processes it and n8n routes the results to downstream actions. This event-driven architecture means your AI processes information as it arrives rather than requiring manual initiation.
Common Workflow Patterns
Email classification and response drafting is one of the most practical workflows in this stack. The workflow monitors an email inbox, extracts new messages, sends the content to Ollama for classification (support request, sales inquiry, partnership opportunity, spam), searches a knowledge base for relevant context, generates a draft response, and either sends it automatically or queues it for human review depending on the classification confidence. This workflow handles the repetitive work of reading and categorizing emails while keeping a human in the loop for final approval.
Document processing workflows ingest files (PDFs, documents, spreadsheets) from a watched folder or file upload endpoint, extract text content, summarize key points using Ollama, extract structured data (names, dates, amounts, categories) into database records, and route the processed information to appropriate channels. Legal teams use this pattern for contract review, finance teams for invoice processing, and research teams for literature analysis.
Notification and alerting workflows monitor data sources (RSS feeds, social media APIs, database queries, web scrapers) for relevant changes, use Ollama to analyze whether each change is significant enough to report, generate human-readable summaries of important changes, and deliver notifications through Slack, email, or other communication channels. This pattern is valuable for competitive intelligence, security monitoring, and trend tracking where the volume of raw data is too high for manual review.
Multi-agent collaboration workflows break complex tasks into subtasks and route each to specialized processing chains. A research workflow might use one Ollama model call to generate search queries, another to evaluate source relevance, a third to extract key findings, and a fourth to synthesize a final report. n8n's visual canvas makes these multi-step, branching workflows easy to understand, modify, and debug compared to writing equivalent logic in code.
Adding Qdrant for Knowledge Search
Many n8n workflows benefit from access to a searchable knowledge base, which means adding Qdrant to the stack. n8n includes a Vector Store node that connects to Qdrant for storing and retrieving embedded documents. This enables RAG-enhanced workflows where the AI agent searches your company's documentation, knowledge base, or historical data before generating responses, significantly improving accuracy for domain-specific questions.
The typical setup involves a separate ingestion workflow that processes new documents, chunks them, generates embeddings through Ollama's embedding endpoint, and stores them in Qdrant. Then your agent workflows include a retrieval step that searches Qdrant for relevant context before calling the LLM. This separation of ingestion and retrieval keeps individual workflows simple while building a growing knowledge base that all workflows can access.
Monitoring and Debugging Workflows
n8n provides built-in execution logging that records every workflow run with its inputs, outputs, timestamps, and status. When a workflow fails, the execution log shows exactly which node failed, what data it received, and what error occurred. This visibility is essential for debugging AI workflows because LLM-based nodes can fail in subtle ways: the model might return an unexpected format, refuse a request due to content filtering, or produce output that downstream nodes cannot parse. The execution log lets you inspect the exact model response and identify the point of failure.
For ongoing monitoring, build a dedicated health-check workflow that runs on a schedule and verifies that each component is responsive. The workflow pings Ollama to confirm model availability, sends a test query to verify inference is working, checks Qdrant connectivity if you are using vector search, and sends an alert through Slack or email if any check fails. This proactive monitoring catches problems before users encounter them and provides early warning when system resources are running low.
Performance debugging in this stack usually centers on Ollama response times. If workflows are running slowly, check whether Ollama is swapping models frequently (each model load takes several seconds), whether the context window is larger than necessary (larger contexts consume more VRAM and slow generation), or whether multiple workflows are competing for model access simultaneously. n8n's execution timing data combined with Ollama's logs provides the information needed to identify and resolve performance bottlenecks.
Scaling Considerations
This stack scales vertically by adding more GPU resources and horizontally by running multiple Ollama instances behind a load balancer. For most small to medium deployments, a single machine with a good GPU handles the workload comfortably. When you need more throughput, the most effective upgrade is adding a second GPU or moving to a GPU with more VRAM, allowing Ollama to serve larger models or handle more concurrent requests without model swapping delays.
n8n itself can be configured for high availability by using PostgreSQL as its database backend instead of the default SQLite and running multiple n8n worker instances. This setup supports queue-based execution where workflow runs are distributed across workers, preventing any single instance from becoming a bottleneck. For the AI components, the key scaling constraint is always GPU throughput: all other parts of the stack scale easily with modest CPU and RAM, while the model inference layer requires deliberate GPU capacity planning based on your actual concurrent request volume.
The n8n, Ollama, and Open WebUI stack gives you both interactive AI chat and automated AI workflows from a single model server. Start with simple workflows (email classification, document summarization) and build complexity gradually as you learn what n8n and your models can handle together.