Learning vertical

Track 03 · Applied & Agentic Intermediate ~8 min

RAG: giving a model your sources

Retrieval-Augmented Generation fixes an LLM's habit of answering from fuzzy memory — it fetches relevant documents first, then hands them to the model alongside your question so the answer is grounded in real sources. Learn the pipeline, the building blocks, and when to reach for RAG instead of fine-tuning, right here on the page.

Module progress

01The problem RAG solves

Imagine asking a brilliant friend a question, but they can only answer from memory — they aren't allowed to look anything up. That is how an AI chatbot normally works: it replies from fuzzy memory, the patterns it picked up while being trained. That memory is frozen at a cutoff date, can't see your private documents, and will sometimes produce a confident but wrong answer (a hallucination). Retrieval-Augmented Generation (RAG) is simply letting that friend open the right book first: it fetches the relevant documents and hands them to the model alongside your question. The model then answers grounded in those real sources — so it can stay current, point back to where the answer came from, and make things up far less often.

RAG adds knowledge at answer-time — no retraining the model.
Answers become traceable to sources you control, instead of opaque memory.
Update the knowledge base, and the next answer reflects it immediately.

02Run the pipeline

RAG is a five-stage pipeline: your question goes in, the system searches a knowledge base, retrieves the most relevant chunks, augments the prompt by adding those chunks, and the model generates a grounded answer. Step through it, then compare a model answering with and without RAG.

WalkthroughWatch it flow, then read the steps below

your question

You ask. A plain-language question — for example, "What's our refund policy?"

search sources

Retriever runs. The question is matched against a knowledge base of your documents to find the closest passages.

retrieve chunks

Top matches returned. A handful of the most relevant text chunks — the supporting evidence — are pulled out.

augment prompt

Context added. Those chunks are placed into the prompt alongside your question, so the model reads them before answering.

grounded answer

Generator responds. The model writes an answer based on the retrieved sources — current, specific, and traceable back to the documents.

✕ Without RAG

The model answers from memory. If your policy changed last week, it won't know — and may invent a confident, wrong answer.

✓ With RAG

The model reads your actual policy doc first, then answers — current, specific, and traceable to the source.

03The building blocks

Under the hood, four pieces make retrieval work. Documents are split into chunks; each chunk is turned into an embedding and stored in a vector store; a retriever finds the chunks closest to your question, and hands them to the model (the generator). Tap each block to see what it does.

ExploreTap a building block

The retrieval layer (indexing → search)

Chunkingsplit docs

Embeddingstext → vectors

Vector storeindex of vectors

Retrieverfinds matches

Step 1 — preparing your data

Chunking

Long documents are split into smaller, self-contained passages called chunks — often a few paragraphs each. Smaller chunks make retrieval more precise (you fetch just the relevant bit), and they keep the added context short enough to fit in the prompt.

04RAG vs fine-tuning

People often confuse these two ways of giving a model new abilities. The short version: RAG adds knowledge at answer-time, while fine-tuning changes the model's behavior and style. They solve different problems — and can be combined.

ExploreSwitch view

RAG — adds knowledge at answer-time

Keeps the model unchanged and feeds it relevant documents when a question is asked. Best when facts change often or live in your own sources. Easy to update — just change the documents — and answers can cite where they came from.

good for company docs, product manuals, policies, fast-changing facts

update edit the knowledge base — no retraining

traceable answers point back to source chunks

Fine-tuning — changes behavior & style

Continues training the model on examples so it learns a way of responding — a tone, a format, a specialized skill. It bakes patterns into the model's weights. Heavier to do, and not the right tool for facts that change quickly, since updating means training again.

good for consistent tone, output format, a niche task or jargon

update re-train on new examples (slower, costlier)

changes the model's behavior, not its live knowledge

Which to use

Reach for RAG when the gap is knowledge: the model needs current or private facts it was never trained on. Reach for fine-tuning when the gap is behavior: the model knows enough but should respond in a particular style or format. Many real systems use both — fine-tune for behavior, RAG for facts.

need current / private facts? → RAG

need a consistent style or skill? → Fine-tuning

need both? → combine them

05Check your understanding

TJS Quiz

Certificate of Completion

'+esc(D.topic||'Quiz')+'

This recognizes

'+(name||'—')+'

for completing the assessment at the '+esc(cat)+' level ('+pct+'%).

'+ds+' · TJS AI Knowledge Hub · ID '+id+'

A self-assessment summary recognizing completion of an educational module — not a professional certification.

window.onload=function(){window.print();}<\/scr'+'ipt>'; var w=window.open('','_blank'); if(w){ w.document.write(html); w.document.close(); } } renderStart(); })();

06Take it with you & go deeper

"RAG in 5 minutes" — one-page summary

The whole module distilled to a printable cheat-sheet.

▸ Already on the site — go deeper

Glossary

RAG — AI Glossary

The concise definition, plus related terms, in the AI Glossary.

Open →

Glossary

Embeddings — AI Glossary

What it means to turn text into vectors, and why similarity search works.

Open →

▸ Coming next — deeper progression

Coming soon

Embeddings & vector search

How text becomes vectors, what similarity means, and how a vector store finds the closest chunks.

In the pipeline

Coming soon

RAG vs fine-tuning (deep dive)

A fuller comparison with cost, latency, and update trade-offs — and patterns for combining both.

In the pipeline

→Continue learning

Sources & review

Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts and is grounded in the references below; figures shown in the interactives are illustrative and labelled as such.

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al. (2020)
Retrieval-Augmented Generation (RAG) — learn — Pinecone
Retrieval-Augmented Generation (RAG) tutorial — LangChain
Text embedding models — LangChain
Vector stores — LangChain
Vector embeddings — learn — Pinecone
Retrievers — LangChain

RAG (Retrieval-Augmented Generation) — in 5 minutes

Tech Jacks Solutions · AI Knowledge Hub · educational summary

What it is

A technique that fixes an LLM's habit of answering from fuzzy memory: it fetches relevant documents first, then gives them to the model alongside the question, so the answer is grounded in real sources — current, specific, and traceable.

The pipeline

Your question → search a knowledge base → retrieve the most relevant chunks → augment the prompt (add the chunks) → the model generates a grounded, source-backed answer.

Building blocks

Chunking — documents are split into smaller passages. Embeddings — each chunk is turned into a vector (numbers that capture meaning). Vector store — an index of those vectors for fast similarity search. Retriever — finds the chunks closest to your question. The LLM is the generator that writes the final answer.

Benefits

Current information without retraining · answers traceable to sources · far fewer hallucinations.

RAG vs fine-tuning

RAG adds knowledge at answer-time (easy to update — just change the documents). Fine-tuning changes the model's behavior and style (heavier; not for fast-changing facts). Use RAG for a knowledge gap, fine-tuning for a behavior gap, or combine both.

Gallery

Contacts