What Is RAG and How Do Agents Use It
The Detailed Answer
A language model on its own can only draw on what it learned during training. That knowledge is broad but frozen at the moment training ended, it contains nothing private or specific to your situation, and the model cannot tell you where any particular claim came from. RAG addresses all three limits at once by adding a retrieval step before generation. When a question arrives, the system searches an external store of documents or data for the passages most relevant to that question, places those passages into the prompt, and instructs the model to answer based on them.
The effect is to separate knowledge from the model. The model supplies the language ability and reasoning, while the external store supplies the facts, which can be updated, expanded, or corrected at any time without retraining anything. This is why RAG has become the standard way to build AI systems that answer from a specific body of knowledge, whether that is a company's documentation, a user's history, or a constantly changing dataset. The retrieval machinery it relies on is the same vector search described in vector search.
How RAG Works Step by Step
RAG runs as a short pipeline each time the agent answers. First, the system takes the incoming question and converts it into an embedding, the same numeric representation of meaning used to index the knowledge store. Second, it searches the store for the passages whose embeddings are closest to the question, retrieving the handful most likely to be relevant, often refined with keyword matching and a reranking step for precision. Third, it assembles those passages into the prompt as reference material, clearly marked as context the model should use.
Fourth, the model generates an answer grounded in the supplied passages rather than from memory alone, ideally citing which source each claim came from. The quality of the whole pipeline hinges on the retrieval step, because the model can only answer well from material that was actually surfaced; if retrieval misses the relevant passage, no amount of model skill recovers it. This is why so much of building good RAG is really about building good retrieval, the subject of memory retrieval strategies, and why assembling the source material well, covered in how to build a knowledge base, matters so much.
A concrete example shows the value. Ask a bare model what your company parental leave policy is, and it cannot know, because the policy lives in an internal document it never saw during training. Wrap the same model in RAG over the company handbook and the flow changes completely: the system retrieves the parental leave section, places it in the prompt, and the model answers accurately, quoting the actual policy and pointing to the document it came from. Nothing about the model itself changed, yet it went from useless to authoritative on that question purely because the right passage was retrieved and supplied at answer time. Multiply that across every internal question an organization fields, and the appeal of grounding answers in retrieved sources becomes obvious.
RAG, Memory, and Knowledge Bases
RAG is the umbrella pattern, and both agent memory and knowledge bases are specific applications of it. A knowledge base is RAG over a curated, relatively stable set of documents loaded in advance, such as product manuals or policies, giving the agent a reference library to answer from. Agent memory is RAG over information the agent writes continuously from its own interactions, such as user preferences and past outcomes, giving the agent personal recall. The two differ in where the content comes from and how often it changes, but they share the same retrieve-and-inject machinery underneath, described in how memory systems work.
Seeing them as one family clarifies how to build a capable agent. The same embedding model, vector store, and retrieval pipeline can serve both a knowledge base of reference material and a memory of personal experience, often side by side, with the agent drawing on each as the task requires. This shared foundation is why understanding RAG is so central to agent memory, and why the broader treatment of the pattern in the RAG guide connects directly to everything covered here. Master retrieval augmented generation and you have mastered the core mechanism behind both memory and knowledge in modern agents.
RAG, retrieval augmented generation, gives a language model relevant external information at answer time by retrieving it and adding it to the prompt, so the model answers from current, specific, or private knowledge it was never trained on. Agents use it to ground their responses, and agent memory is simply RAG applied to the agent's own accumulated experience. Because the model can only use what retrieval surfaces, building good RAG is mostly about building good retrieval.