RAG: giving a model your sources
Retrieval-Augmented Generation fixes an LLM's habit of answering from fuzzy memory. It fetches relevant documents first, then hands them to the model alongside your question so the answer is grounded in real sources. Learn the pipeline, the building blocks, and when to reach for RAG instead of fine-tuning, right here on the page.
01The problem RAG solves
Imagine asking a brilliant friend a question, but they can only answer from memory and aren't allowed to look anything up. That is how an AI chatbot normally works: it replies from fuzzy memory, the patterns it picked up while being trained. That memory is frozen at a cutoff date, can't see your private documents, and will sometimes produce a confident but wrong answer (a hallucination). Retrieval-Augmented Generation (RAG) is simply letting that friend open the right book first: it fetches the relevant documents and hands them to the model alongside your question. The model then answers grounded in those real sources, so it can stay current, point back to where the answer came from, and make things up far less often.
Anchor your AI program in a charter. The AI Governance Charter: establish ownership, scope, and accountability for AI.
Your purchase helps keep our hubs free to read.
- RAG adds knowledge at answer-time, with no model retraining.
- Answers become traceable to sources you control, instead of opaque memory.
- Update the knowledge base, and the next answer reflects it immediately.
02Run the pipeline
RAG is a five-stage pipeline: your question goes in, the system searches a knowledge base, retrieves the most relevant chunks, augments the prompt by adding those chunks, and the model generates a grounded answer. Step through it, then compare a model answering with and without RAG.
✕ Without RAG
The model answers from memory. If your policy changed last week, it won't know, and may invent a confident, wrong answer.
✓ With RAG
The model reads your actual policy doc first, then answers: current, specific, and traceable to the source.
03The building blocks
Under the hood, four pieces make retrieval work. Documents are split into chunks; each chunk is turned into an embedding and stored in a vector store; a retriever finds the chunks closest to your question, and hands them to the model (the generator). Tap each block to see what it does.
Chunking
Long documents are split into smaller, self-contained passages called chunks, often a few paragraphs each. Smaller chunks make retrieval more precise (you fetch just the relevant bit), and they keep the added context short enough to fit in the prompt.
04RAG vs fine-tuning
People often confuse these two ways of giving a model new abilities. The short version: RAG adds knowledge at answer-time, while fine-tuning changes the model's behavior and style. They solve different problems, and can be combined.
RAG: adds knowledge at answer-time
Keeps the model unchanged and feeds it relevant documents when a question is asked. Best when facts change often or live in your own sources. Easy to update (just change the documents), and answers can cite where they came from.
Fine-tuning: changes behavior & style
Continues training the model on examples so it learns a way of responding: a tone, a format, a specialized skill. It bakes patterns into the model's weights. Heavier to do, and not the right tool for facts that change quickly, since updating means training again.
Which to use
Reach for RAG when the gap is knowledge: the model needs current or private facts it was never trained on. Reach for fine-tuning when the gap is behavior: the model knows enough but should respond in a particular style or format. Many real systems use both: fine-tune for behavior, RAG for facts.
05Check your understanding
You finished RAG
Here’s where it sits in your path, and the strongest next move.
Recommended next
How they model relationships and power GraphRAG for better AI answers, 2026.
Vector Databases
How they store embeddings and power semantic search and RAG.
Open lesson → AgenticSemantic Search
Continue with Semantic Search.
Open lesson → AgenticAgentic RAG
Continue with Agentic RAG.
Open lesson →Chatbots
How they understand and respond, their limits, and how they differ from agents.
Open lesson → AgenticMulti-agent systems
How multiple AI agents coordinate, communicate and divide up work.
Open lesson → AgenticAI Coding Assistants & Agents
On how AI helps you write, explain and ship code, and where it fits.
Open lesson →The problem RAG solves
- A base model answers from trained-in memory that is frozen at a cutoff and can't see your private documents.
- RAG adds knowledge at answer-time, with no model retraining.
- Update the knowledge base and the next answer reflects it immediately.
The RAG pipeline
- Five stages: question → search → retrieve → augment → generate.
- The retriever matches your question against a knowledge base and returns the closest passages as evidence.
- Those chunks are placed into the prompt so the model reads them before answering.
The building blocks
- Chunking: documents are split into small, self-contained passages for precise retrieval.
- Embeddings: each chunk becomes a vector capturing meaning, so similar ideas have similar vectors.
- Vector store + retriever: the store indexes vectors; the retriever queries it for the closest chunks to the question.
RAG vs fine-tuning
- RAG adds knowledge at answer-time; the model's weights are unchanged and it's easy to update.
- Fine-tuning changes behavior, tone, or format by training on examples; not suited to fast-changing facts.
- Knowledge gap → RAG; behavior gap → fine-tuning. The two can be combined.
Benefits & failure modes
- Current info without retraining, answers traceable to sources, and fewer hallucinations.
- Quality is bounded by retrieval: if the right chunk isn't found, the answer suffers.
- When answers go wrong, retrieval is usually the first place to look.
Every claim below links to its primary source so you can go straight to the original.
Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts and is grounded in the references below; figures shown in the interactives are illustrative and labelled as such.
RAG (Retrieval-Augmented Generation), in 5 minutes
Tech Jacks Solutions · AI Knowledge Hub · educational summary
What it is
A technique that fixes an LLM's habit of answering from fuzzy memory: it fetches relevant documents first, then gives them to the model alongside the question, so the answer is grounded in real sources: current, specific, and traceable.
The pipeline
Your question → search a knowledge base → retrieve the most relevant chunks → augment the prompt (add the chunks) → the model generates a grounded, source-backed answer.
Building blocks
Chunking: documents are split into smaller passages. Embeddings: each chunk is turned into a vector (numbers that capture meaning). Vector store: an index of those vectors for fast similarity search. Retriever: finds the chunks closest to your question. The LLM is the generator that writes the final answer.
Benefits
Current information without retraining · answers traceable to sources · far fewer hallucinations.
RAG vs fine-tuning
RAG adds knowledge at answer-time (easy to update, just change the documents). Fine-tuning changes the model's behavior and style (heavier; not for fast-changing facts). Use RAG for a knowledge gap, fine-tuning for a behavior gap, or combine both.