Building RAG Pipelines in n8n
Step 1: Choose Your Vector Store
n8n supports several vector databases natively. Qdrant is the recommended choice for self-hosted setups because it is included in the AI Starter Kit, runs efficiently in Docker, and handles moderate document collections well. Pinecone is the recommended choice for cloud setups because it is fully managed with no infrastructure to maintain. Supabase pgvector is good if you already use Supabase for other data. The In-Memory Vector Store works for testing but loses all data when the workflow restarts.
For a self-hosted setup with Qdrant, create a collection in Qdrant for your documents. Each collection has a name and a vector dimension that must match your embedding model's output dimension (768 for nomic-embed-text, 1536 for OpenAI text-embedding-3-small). You can create collections through the Qdrant dashboard (localhost:6333/dashboard) or let n8n create them automatically on first insert.
Step 2: Build the Ingestion Pipeline
The ingestion workflow takes raw documents and stores them as searchable vectors. Start with a trigger (manual for testing, webhook or schedule for production). Add a document source: File Upload for local files, Google Drive for cloud documents, or HTTP Request for web pages. Connect to a Text Splitter node configured with a chunk size (500 to 1000 characters is a good starting point) and chunk overlap (50 to 100 characters prevents information from being split at chunk boundaries).
Connect the Text Splitter to an Embedding node (OpenAI Embeddings or Ollama Embeddings with nomic-embed-text). Connect the Embedding node to your Vector Store node in insert mode. Each chunk becomes a vector in your database, along with the chunk text as metadata. Run the workflow to ingest your documents.
For larger document sets, add a Loop node to process documents one at a time, preventing memory issues from loading everything at once. Add error handling with a Try/Catch pattern (using the Error Trigger node) to handle documents that fail to process without stopping the entire ingestion.
Step 3: Build the Query Pipeline
The query workflow receives a user question and returns a context-aware answer. Start with a Chat Trigger for interactive Q&A or a Webhook Trigger for API access. Add an AI Agent node with a Chat Model (OpenAI, Anthropic, or Ollama). Add a Vector Store node connected to a Vector Store Retriever sub-node, and connect the retriever to the AI Agent's tools input.
The retriever automatically embeds the user's question using the same embedding model used during ingestion, queries the vector database for the most similar chunks, and returns them to the agent as context. The agent then uses this context to answer the question accurately. Configure the retriever to return 3 to 5 chunks (the topK parameter), which provides sufficient context without overwhelming the LLM's context window.
An alternative approach uses the QA Chain with Retriever node instead of an AI Agent. The QA Chain is simpler (no tool selection loop) and faster, but less flexible. It always retrieves context and generates an answer, without the ability to decide whether retrieval is needed or to call other tools. Use the QA Chain for pure document Q&A and the AI Agent for more complex scenarios where the agent might need other tools in addition to document retrieval.
Step 4: Tune Retrieval Quality
Retrieval quality determines the overall quality of your RAG system. If the retriever returns irrelevant chunks, the LLM will generate inaccurate answers regardless of its capabilities. Key parameters to tune include chunk size (smaller chunks give more precise retrieval but less context), chunk overlap (prevents important information from being split), the number of retrieved chunks (more is better up to a point, then noise increases), and the embedding model (higher quality embeddings produce better semantic matching).
Test your retrieval pipeline with representative questions and manually inspect the retrieved chunks. If the retrieved chunks are irrelevant to the question, the most common fixes are reducing chunk size (so each chunk is more focused), improving document preprocessing (removing headers, footers, and boilerplate before chunking), and trying a different embedding model.
Metadata filtering can improve retrieval for document collections with multiple categories or sources. When ingesting documents, add metadata fields (source, category, date) to each chunk. When querying, filter by metadata to restrict retrieval to relevant subsets of your collection.
Step 5: Deploy for Production
Production RAG systems need error handling, monitoring, and operational procedures. Add error handling for common failure modes: LLM API rate limits (add retry logic with exponential backoff), vector store connection failures (add health checks), and malformed user input (add input validation before the AI nodes).
Monitor retrieval quality over time by logging retrieved chunks alongside user questions and AI responses. Periodic review of these logs reveals retrieval failures that users may not report. Track token usage and costs per query for budget management.
Plan for document updates. When source documents change, you need to re-ingest the updated content. Implement a strategy for incremental updates (adding new chunks and removing outdated ones) rather than re-ingesting the entire collection each time. Use metadata (document ID, version, timestamp) to track which chunks belong to which source documents.