Learning lesson

Track 03 · Applied & Agentic Intermediate ~8 min

RAG: giving a model your sources

Retrieval-Augmented Generation fixes an LLM's habit of answering from fuzzy memory. It fetches relevant documents first, then hands them to the model alongside your question so the answer is grounded in real sources. Learn the pipeline, the building blocks, and when to reach for RAG instead of fine-tuning, right here on the page.

Module progress

01The problem RAG solves

Imagine asking a brilliant friend a question, but they can only answer from memory and aren't allowed to look anything up. That is how an AI chatbot normally works: it replies from fuzzy memory, the patterns it picked up while being trained. That memory is frozen at a cutoff date, can't see your private documents, and will sometimes produce a confident but wrong answer (a hallucination). Retrieval-Augmented Generation (RAG) is simply letting that friend open the right book first: it fetches the relevant documents and hands them to the model alongside your question. The model then answers grounded in those real sources, so it can stay current, point back to where the answer came from, and make things up far less often.

RAG adds knowledge at answer-time, with no model retraining.
Answers become traceable to sources you control, instead of opaque memory.
Update the knowledge base, and the next answer reflects it immediately.

02Run the pipeline

RAG is a five-stage pipeline: your question goes in, the system searches a knowledge base, retrieves the most relevant chunks, augments the prompt by adding those chunks, and the model generates a grounded answer. Step through it, then compare a model answering with and without RAG.

WalkthroughWatch it flow, then read the steps below

your question

You ask. A plain-language question, for example, "What's our refund policy?"

search sources

Retriever runs. The question is matched against a knowledge base of your documents to find the closest passages.

retrieve chunks

Top matches returned. A handful of the most relevant text chunks, the supporting evidence, are pulled out.

augment prompt

Context added. Those chunks are placed into the prompt alongside your question, so the model reads them before answering.

grounded answer

Generator responds. The model writes an answer based on the retrieved sources: current, specific, and traceable back to the documents.

✕ Without RAG

The model answers from memory. If your policy changed last week, it won't know, and may invent a confident, wrong answer.

✓ With RAG

The model reads your actual policy doc first, then answers: current, specific, and traceable to the source.

03The building blocks

Under the hood, four pieces make retrieval work. Documents are split into chunks; each chunk is turned into an embedding and stored in a vector store; a retriever finds the chunks closest to your question, and hands them to the model (the generator). Tap each block to see what it does.

ExploreTap a building block

The retrieval layer (indexing → search)

Chunkingsplit docs

Embeddingstext → vectors

Vector storeindex of vectors

Retrieverfinds matches

Step 1: preparing your data

Chunking

Long documents are split into smaller, self-contained passages called chunks, often a few paragraphs each. Smaller chunks make retrieval more precise (you fetch just the relevant bit), and they keep the added context short enough to fit in the prompt.

04RAG vs fine-tuning

People often confuse these two ways of giving a model new abilities. The short version: RAG adds knowledge at answer-time, while fine-tuning changes the model's behavior and style. They solve different problems, and can be combined.

ExploreSwitch view

RAG: adds knowledge at answer-time

Keeps the model unchanged and feeds it relevant documents when a question is asked. Best when facts change often or live in your own sources. Easy to update (just change the documents), and answers can cite where they came from.

good for company docs, product manuals, policies, fast-changing facts

update edit the knowledge base, no retraining

traceable answers point back to source chunks

Fine-tuning: changes behavior & style

Continues training the model on examples so it learns a way of responding: a tone, a format, a specialized skill. It bakes patterns into the model's weights. Heavier to do, and not the right tool for facts that change quickly, since updating means training again.

good for consistent tone, output format, a niche task or jargon

update re-train on new examples (slower, costlier)

changes the model's behavior, not its live knowledge

Which to use

Reach for RAG when the gap is knowledge: the model needs current or private facts it was never trained on. Reach for fine-tuning when the gap is behavior: the model knows enough but should respond in a particular style or format. Many real systems use both: fine-tune for behavior, RAG for facts.

need current / private facts? → RAG

need a consistent style or skill? → Fine-tuning

need both? → combine them

05Check your understanding

TJS Quiz

Keep going

You finished RAG

Here’s where it sits in your path, and the strongest next move.

FoundationsLanguage & modelsAgenticGovernance

▸

Recommended next

Knowledge Graphs & GraphRAG

How they model relationships and power GraphRAG for better AI answers, 2026.

Start lesson →

Build on this

Agentic

Vector Databases

How they store embeddings and power semantic search and RAG.

Open lesson → Agentic

Semantic Search

Continue with Semantic Search.

Open lesson → Agentic

Agentic RAG

Continue with Agentic RAG.

Open lesson →

Go deeper

Agentic

Chatbots

How they understand and respond, their limits, and how they differ from agents.

Open lesson → Agentic

Multi-agent systems

How multiple AI agents coordinate, communicate and divide up work.

Open lesson → Agentic

AI Coding Assistants & Agents

On how AI helps you write, explain and ship code, and where it fits.

Open lesson →

⊕The lesson at a glance

The problem RAG solves

A base model answers from trained-in memory that is frozen at a cutoff and can't see your private documents.
RAG adds knowledge at answer-time, with no model retraining.
Update the knowledge base and the next answer reflects it immediately.

The RAG pipeline

Five stages: question → search → retrieve → augment → generate.
The retriever matches your question against a knowledge base and returns the closest passages as evidence.
Those chunks are placed into the prompt so the model reads them before answering.

The building blocks

Chunking: documents are split into small, self-contained passages for precise retrieval.
Embeddings: each chunk becomes a vector capturing meaning, so similar ideas have similar vectors.
Vector store + retriever: the store indexes vectors; the retriever queries it for the closest chunks to the question.

RAG vs fine-tuning

RAG adds knowledge at answer-time; the model's weights are unchanged and it's easy to update.
Fine-tuning changes behavior, tone, or format by training on examples; not suited to fast-changing facts.
Knowledge gap → RAG; behavior gap → fine-tuning. The two can be combined.

Benefits & failure modes

Current info without retraining, answers traceable to sources, and fewer hallucinations.
Quality is bounded by retrieval: if the right chunk isn't found, the answer suffers.
When answers go wrong, retrieval is usually the first place to look.

⇩Take it with you

⎘

One-page summaryThe whole lesson on a printable cheat-sheet.

Every claim below links to its primary source so you can go straight to the original.

✓ VerifiedPublished by Tech Jacks Solutions · Reviewed June 2026 · Grounded in 7 sources

Retrieval-Augmented Generation for Knowledge-Intensive NLP TasksLewis et al. (2020) Retrieval-Augmented Generation (RAG): learnPinecone Retrieval-Augmented Generation (RAG) tutorialLangChain Text embedding modelsLangChain Vector storesLangChain Vector embeddings: learnPinecone RetrieversLangChain

Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts and is grounded in the references below; figures shown in the interactives are illustrative and labelled as such.

RAG (Retrieval-Augmented Generation), in 5 minutes

Tech Jacks Solutions · AI Knowledge Hub · educational summary

What it is

A technique that fixes an LLM's habit of answering from fuzzy memory: it fetches relevant documents first, then gives them to the model alongside the question, so the answer is grounded in real sources: current, specific, and traceable.

The pipeline

Your question → search a knowledge base → retrieve the most relevant chunks → augment the prompt (add the chunks) → the model generates a grounded, source-backed answer.

Building blocks

Chunking: documents are split into smaller passages. Embeddings: each chunk is turned into a vector (numbers that capture meaning). Vector store: an index of those vectors for fast similarity search. Retriever: finds the chunks closest to your question. The LLM is the generator that writes the final answer.

Benefits

Current information without retraining · answers traceable to sources · far fewer hallucinations.

RAG vs fine-tuning

RAG adds knowledge at answer-time (easy to update, just change the documents). Fine-tuning changes the model's behavior and style (heavier; not for fast-changing facts). Use RAG for a knowledge gap, fine-tuning for a behavior gap, or combine both.

Gallery

Contacts

RAG: giving a model your sources

01The problem RAG solves

02Run the pipeline

✕ Without RAG

✓ With RAG

03The building blocks

Chunking

04RAG vs fine-tuning

RAG: adds knowledge at answer-time

Fine-tuning: changes behavior & style

Which to use

05Check your understanding

You finished RAG

Vector Databases

Semantic Search

Agentic RAG

Chatbots

Multi-agent systems

AI Coding Assistants & Agents

RAG (Retrieval-Augmented Generation), in 5 minutes

What it is

The pipeline

Building blocks

Benefits

RAG vs fine-tuning

Services

Learn

Company