Language & Generation · learning vertical

Track 01 · Language & Generation Novice · start here ~8 min

How do large language models work?

A chatbot can feel like it "knows" things. Under the hood it's doing something simpler and stranger: chopping your words into tokens, turning them into numbers that capture meaning, and predicting the most likely next token — over and over. No database lookup. Here's the whole idea, on one page.

Module progress

01Tokens: the unit an LLM actually reads

Think of how you might break a long word into smaller chunks to sound it out. A large language model — the "LLM" behind a chatbot — does something similar before it reads anything: it chops your text into small pieces called tokens, often a whole word or just part of one. It then does one thing astonishingly well: predict the next token, again and again, to build a reply. There's no encyclopedia being consulted; it's pattern-completion learned from enormous amounts of text. Type below to watch text get chopped into tokens.

Tokens are the model's unit of cost and context — pricing and limits are measured in tokens, not words.
"Generation" is just next-token prediction repeated until the answer is done.
Because it predicts plausibility, a model can sound certain while being wrong.

InteractiveType to see tokens

≈ Tokens: 0 Characters: 0 Words: 0

Simplified illustration. Real models use learned sub-word pieces (e.g. BPE) that don't line up with letters or word-length — so treat this as "roughly how text gets chopped up," not an exact token count.

Predict the next token: “The sky is ___”

blue

42%

clear

21%

grey

15%

falling

Illustrative probabilities, for intuition only — to show that the model ranks candidates rather than "knowing" one answer.

02The four moves: tokenize, embed, attend, predict

Everything an LLM does chains together four ideas. Tap each one to see what it means — notice how each step hands its output to the next: tokens become embeddings, attention weighs which earlier tokens matter, and the model predicts the next token.

ExploreTap each step

Tokenizesplit text

Embedto vectors

Attentionweigh context

Predict nextpick a token

Step 1

Tokenize

The model splits your text into tokens — sub-word pieces that are its smallest unit. Common words may be one token; rarer or longer words break into several. This is also why cost and context limits are counted in tokens, not words.

03The generation loop, step by step

Putting it together: an LLM answers by running a loop. It reads your prompt as tokens, turns them into meaning, weighs which earlier tokens matter, predicts one likely next token, appends it, and repeats — until it decides the answer is done. Step through it.

WalkthroughStep or run the loop

Tokenize prompt

Read the input. Your prompt is split into tokens — the model's smallest units of text.

Embed

Turn tokens into vectors. Each token becomes a list of numbers that captures aspects of its meaning, so related words sit near each other.

Attention

Weigh the context. The transformer uses attention to decide which earlier tokens matter most for what should come next.

Predict next token

Rank the candidates. The model produces a probability for every possible next token and picks one likely option.

Append & repeat

Grow the answer. The chosen token is added to the text, and the whole loop runs again with the longer context.

Stop

Finish. When the model predicts an end-of-text signal or hits a length limit, it stops and returns the completed reply.

Worth knowing: because an LLM predicts plausible-sounding text rather than looking up facts, it can state something confidently and still be wrong — often called a hallucination. The context window caps how much text it can consider at once. Treat answers as a strong first draft: review and verify before relying on them, especially for anything important.

04Check your understanding

TJS Quiz

Certificate of Completion

'+esc(D.topic||'Quiz')+'

This recognizes

'+(name||'—')+'

for completing the assessment at the '+esc(cat)+' level ('+pct+'%).

'+ds+' · TJS AI Knowledge Hub · ID '+id+'

A self-assessment summary recognizing completion of an educational module — not a professional certification.

window.onload=function(){window.print();}<\/scr'+'ipt>'; var w=window.open('','_blank'); if(w){ w.document.write(html); w.document.close(); } } renderStart(); })();

05Take it with you & go deeper

"How LLMs work in 5 minutes" — one-page summary

The whole module distilled to a printable cheat-sheet.

▸ Look up a term — AI glossary

Glossary

Large language model

The one-line definition plus the key terms around it.

Look up →

Glossary

Token

What a token is and why billing and limits are measured in them.

Look up →

▸ Coming next — deeper progression

Coming soon

Embeddings & vectors

How tokens turn into numbers that capture meaning — and why similar ideas sit close together.

In the pipeline

Coming soon

Attention explained

The mechanism that lets a transformer weigh which earlier tokens matter most.

In the pipeline

→Continue learning

Sources & review

Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts and is grounded in the references below; figures shown in the interactives are illustrative and labelled as such.

CS224n — Natural Language Processing with Deep Learning — Stanford
Language Models are Few-Shot Learners (GPT-3) — Brown et al. (2020)
The Illustrated Transformer — Jay Alammar
Attention Is All You Need — Vaswani et al. (2017)

How large language models work — in 5 minutes

Tech Jacks Solutions · AI Knowledge Hub · educational summary

Tokens

An LLM splits text into tokens — sub-word pieces. Tokens are the model's unit of cost and context: pricing and limits are measured in tokens, not words.

Embeddings

Each token is turned into an embedding — a list of numbers (a vector) that captures aspects of its meaning, so related words sit near one another.

Transformer & attention

A transformer uses attention to weigh which earlier tokens matter most for deciding what should come next.

Generation = predict the next token

The model predicts the most likely next token, appends it, and repeats until it stops. There is no database lookup — it's learned pattern-completion. That's why it can sound confident but be wrong (a hallucination).

Context window

The context window is how much text the model can consider at once, measured in tokens — which is also how cost and limits are counted.

Use it wisely

Treat answers as a strong first draft. Review and verify before relying on them, especially for anything important.

Gallery

Contacts