Gallery

Contacts

405 W. Greenlawn Ave Lansing, Michigan 48910

contact@techjacksolutions.com

+1-616-320-4064

Language & Generation · learning vertical
Track 01 · Language & Generation Novice · start here ~8 min

How do large language models work?

A chatbot can feel like it "knows" things. Under the hood it's doing something simpler and stranger: chopping your words into tokens, turning them into numbers that capture meaning, and predicting the most likely next token — over and over. No database lookup. Here's the whole idea, on one page.

Module progress
0%

01Tokens: the unit an LLM actually reads

Think of how you might break a long word into smaller chunks to sound it out. A large language model — the "LLM" behind a chatbot — does something similar before it reads anything: it chops your text into small pieces called tokens, often a whole word or just part of one. It then does one thing astonishingly well: predict the next token, again and again, to build a reply. There's no encyclopedia being consulted; it's pattern-completion learned from enormous amounts of text. Type below to watch text get chopped into tokens.

  • Tokens are the model's unit of cost and context — pricing and limits are measured in tokens, not words.
  • "Generation" is just next-token prediction repeated until the answer is done.
  • Because it predicts plausibility, a model can sound certain while being wrong.
InteractiveType to see tokens
≈ Tokens: 0 Characters: 0 Words: 0

Simplified illustration. Real models use learned sub-word pieces (e.g. BPE) that don't line up with letters or word-length — so treat this as "roughly how text gets chopped up," not an exact token count.

Predict the next token: “The sky is ___
blue
42%
clear
21%
grey
15%
falling
6%

Illustrative probabilities, for intuition only — to show that the model ranks candidates rather than "knowing" one answer.

02The four moves: tokenize, embed, attend, predict

Everything an LLM does chains together four ideas. Tap each one to see what it means — notice how each step hands its output to the next: tokens become embeddings, attention weighs which earlier tokens matter, and the model predicts the next token.

ExploreTap each step
Tokenizesplit text
Embedto vectors
Attentionweigh context
Predict nextpick a token
Step 1

Tokenize

The model splits your text into tokens — sub-word pieces that are its smallest unit. Common words may be one token; rarer or longer words break into several. This is also why cost and context limits are counted in tokens, not words.

03The generation loop, step by step

Putting it together: an LLM answers by running a loop. It reads your prompt as tokens, turns them into meaning, weighs which earlier tokens matter, predicts one likely next token, appends it, and repeats — until it decides the answer is done. Step through it.

WalkthroughStep or run the loop
Tokenize prompt
Read the input. Your prompt is split into tokens — the model's smallest units of text.
Embed
Turn tokens into vectors. Each token becomes a list of numbers that captures aspects of its meaning, so related words sit near each other.
Attention
Weigh the context. The transformer uses attention to decide which earlier tokens matter most for what should come next.
Predict next token
Rank the candidates. The model produces a probability for every possible next token and picks one likely option.
Append & repeat
Grow the answer. The chosen token is added to the text, and the whole loop runs again with the longer context.
Stop
Finish. When the model predicts an end-of-text signal or hits a length limit, it stops and returns the completed reply.
Worth knowing: because an LLM predicts plausible-sounding text rather than looking up facts, it can state something confidently and still be wrong — often called a hallucination. The context window caps how much text it can consider at once. Treat answers as a strong first draft: review and verify before relying on them, especially for anything important.

04Check your understanding

TJS Quiz
window.onload=function(){window.print()}<\/scr'+'ipt>'; var w=window.open('','_blank'); if(w){ w.document.write(html); w.document.close(); } } function accentHex(){ var v=getComputedStyle(root).getPropertyValue('--tjq-accent').trim(); return v||'#2095e9'; } function dlCanvas(cv){ var a=document.createElement('a'); a.download=(D.id||'quiz')+'-result.png'; a.href=cv.toDataURL('image/png'); a.click(); } function shareCard(pct,cat){ var cv=$('#tjqCardCv'); if(!cv||!cv.getContext) return; var x=cv.getContext('2d'),W=cv.width,H=cv.height,acc=accentHex(); var g=x.createLinearGradient(0,0,W,H); g.addColorStop(0,'#0E1F40'); g.addColorStop(1,'#10294f'); x.fillStyle=g; x.fillRect(0,0,W,H); x.save(); x.globalAlpha=.16; x.fillStyle=acc; x.beginPath(); x.arc(W*.85,H*.16,160,0,7); x.fill(); x.restore(); x.fillStyle='rgba(255,255,255,.55)'; x.font='600 21px DM Sans, sans-serif'; x.fillText('TJS QUIZ · AI KNOWLEDGE HUB',58,76); x.fillStyle='#fff'; x.font='700 60px Fraunces, serif'; x.fillText(D.topic||'Quiz',56,168); x.fillStyle=acc; x.font='700 28px "Space Mono", monospace'; x.fillText(String(cat||'').toUpperCase(),58,H-150); x.fillStyle='#fff'; x.font='700 104px "Archivo Black", sans-serif'; x.fillText(pct+'%',54,H-52); x.fillStyle='rgba(255,255,255,.55)'; x.font='400 21px DM Sans, sans-serif'; x.fillText('scored on the '+(D.topic||'')+' quiz',58,H-22); x.strokeStyle=acc; x.lineWidth=8; x.strokeRect(0,0,W,H); if(cv.toBlob && navigator.canShare){ cv.toBlob(function(blob){ try{ var file=new File([blob],'quiz-result.png',{type:'image/png'}); if(navigator.canShare({files:[file]})){ navigator.share({files:[file],title:'My quiz result',text:'I scored '+pct+'% ('+cat+') on the '+(D.topic||'')+' quiz.'}).catch(function(){dlCanvas(cv);}); return; } }catch(e){} dlCanvas(cv); }); } else dlCanvas(cv); } function certPrint(pct,cat){ var raw=(($('#tjqCertName')||{}).value)||''; var name=esc(raw.trim()); var ds=new Date().toLocaleDateString(undefined,{year:'numeric',month:'long',day:'numeric'}); var id='TJQ-'+String(Math.floor(Math.random()*1e9)); var acc=accentHex(); var html='Certificate
Certificate of Completion

'+esc(D.topic||'Quiz')+'

This recognizes

'+(name||'—')+'

for completing the assessment at the '+esc(cat)+' level ('+pct+'%).

'+ds+' · TJS AI Knowledge Hub · ID '+id+'

A self-assessment summary recognizing completion of an educational module — not a professional certification.

window.onload=function(){window.print();}<\/scr'+'ipt>'; var w=window.open('','_blank'); if(w){ w.document.write(html); w.document.close(); } } renderStart(); })();

05Take it with you & go deeper

"How LLMs work in 5 minutes" — one-page summary
The whole module distilled to a printable cheat-sheet.
▸ Look up a term — AI glossary
▸ Coming next — deeper progression
Coming soon

Embeddings & vectors

How tokens turn into numbers that capture meaning — and why similar ideas sit close together.

In the pipeline
Coming soon

Attention explained

The mechanism that lets a transformer weigh which earlier tokens matter most.

In the pipeline

Continue learning

Sources & review

Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts and is grounded in the references below; figures shown in the interactives are illustrative and labelled as such.

How large language models work — in 5 minutes

Tech Jacks Solutions · AI Knowledge Hub · educational summary

Tokens

An LLM splits text into tokens — sub-word pieces. Tokens are the model's unit of cost and context: pricing and limits are measured in tokens, not words.

Embeddings

Each token is turned into an embedding — a list of numbers (a vector) that captures aspects of its meaning, so related words sit near one another.

Transformer & attention

A transformer uses attention to weigh which earlier tokens matter most for deciding what should come next.

Generation = predict the next token

The model predicts the most likely next token, appends it, and repeats until it stops. There is no database lookup — it's learned pattern-completion. That's why it can sound confident but be wrong (a hallucination).

Context window

The context window is how much text the model can consider at once, measured in tokens — which is also how cost and limits are counted.

Use it wisely

Treat answers as a strong first draft. Review and verify before relying on them, especially for anything important.