Supervised, unsupervised, or reinforcement learning?
Almost every machine-learning system fits into one of three families — and the family is decided by one thing: what feedback the model learns from. Labels, hidden structure, or reward. Learn to tell them apart, then sort real tasks into the right bucket yourself, right here on the page.
01Three families, decided by one thing
The AI Governance Charter — establish ownership, scope, and accountability for AI.
Get the charter Browse all templatesYour purchase helps keep our hubs free to read.
When people say "machine learning," they usually mean one of three broad paradigms. They aren't different algorithms so much as different learning settings — and the thing that separates them is the feedback signal each one learns from. Supervised learning gets labels (the right answer attached to every example). Unsupervised learning gets no labels at all and has to find structure on its own. Reinforcement learning gets a reward — a score earned by acting in an environment over time. Keep that one question in mind — "what does it learn from?" — and the rest follows.
Supervised
Trained on input–output pairs where each example carries its correct answer. It learns a mapping from inputs to known targets.
Unsupervised
Works on unlabeled data and discovers structure within it — grouping similar points, or compressing the data into a simpler form.
Reinforcement
An agent acts in an environment and learns a policy by maximizing a cumulative reward earned through trial-and-error interaction.
- The paradigms differ chiefly in their feedback signal: explicit answers (labels), no labels at all, or a scalar reward earned over time.
- The boundaries can blur — semi-supervised and self-supervised learning mix labeled and unlabeled data and don't fit cleanly into the three-way split.
02Supervised — learning from labeled answers
Supervised learning trains a model on labeled examples — input–output pairs — so it learns a mapping from inputs to known target outputs. You give it thousands of emails already marked "spam" or "not spam," and it learns the pattern that connects an email to its label, so it can label new ones it has never seen. There are two classic shapes: classification, where the answer is a discrete label (spam / not spam, cat / dog), and regression, where the answer is a continuous number (tomorrow's temperature, a house price). The whole point is to learn a mapping that generalizes — which is why supervised models are always tested on held-out data they didn't train on.
- Needs labels: every training example comes with its correct answer attached.
- Two task types: classification (discrete labels) and regression (continuous values).
- Example methods: linear and logistic regression, support vector machines, decision trees, and neural-network classifiers.
03Unsupervised — finding structure with no labels
Unsupervised learning works on unlabeled data and its job is to discover structure within it — nobody hands it the right answers. The most common version is clustering: grouping similar data points together, like sorting a pile of customer records into segments that behave alike, without anyone defining the segments in advance. The other big version is dimensionality reduction (and density estimation): compressing data into a simpler representation that keeps what matters and drops the noise. Because there's no answer key, success is harder to measure than in supervised learning — you're looking for useful patterns, not a known target.
- No labels: the system is given raw data and must find the patterns itself.
- Two big families: clustering (grouping by similarity) and dimensionality reduction / density estimation (compressing or representing data).
- Example methods: k-means and DBSCAN clustering, and PCA for dimensionality reduction.
A note for the curious: self-supervised learning — which underpins modern large language models — generates its own training signal from unlabeled data and is usually described separately from this classic unsupervised setting.
04Reinforcement — learning by reward and trial
Reinforcement learning is the odd one out. There's no fixed dataset of answers at all. Instead, an agent takes actions in an environment and learns a policy — a strategy for what to do — by maximizing a cumulative reward signal it earns through trial-and-error interaction over time. Think of a program that plays a game thousands of times, rewarded when it wins and penalized when it loses, gradually shifting its behavior toward whatever earns more reward. A defining feature is the exploration-versus-exploitation trade-off: the agent has to balance trying new actions to gather information against exploiting the actions it already knows pay off. Landmark examples include systems that learned to play Atari games directly from the screen, and the program that mastered the board game Go through self-play.
- No answer key: feedback is a scalar reward earned by acting, not a label attached to each example.
- Learns a policy: a strategy mapping situations to actions, tuned to maximize reward over time.
- Exploration vs exploitation: balancing new actions (to learn) against known good actions (to score).
- Example methods: Q-learning, deep Q-networks (DQN), and policy-gradient methods.
05Sort the task: which paradigm fits?
Here's where it clicks. Below are real machine-learning tasks. Pick a task, then choose the bin you think it belongs in — Supervised, Unsupervised, or Reinforcement. You'll get instant feedback and a one-line reason for each, and the panel underneath shows the signal each paradigm learns from. The fastest way to decide is to ask: does this task come with labeled answers, just raw data to find structure in, or a reward earned by acting?
06Check your understanding
07Take it with you & go deeper
AI vs machine learning vs deep learning
Zoom out one level: how these three paradigms all sit inside the broader machine-learning circle.
Read →How neural networks work
The model architecture that powers supervised classifiers and deep reinforcement learners alike.
Read →RLHF: reinforcement learning from human feedback
See reinforcement learning at work in modern language models — reward signals built from human preferences.
Read →Clustering, hands-on
A closer look at the most common unsupervised task — how k-means and DBSCAN actually group data.
Coming soon★Sources & further reading
Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established, definitional concepts and is grounded in the canonical references below; the interactive uses illustrative example tasks chosen to teach the distinction.
- Reinforcement Learning: An Introduction (2nd ed.) — Richard S. Sutton & Andrew G. Barto
- Deep Learning — Ch. 5: Machine Learning Basics — Goodfellow, Bengio & Courville
- Pattern Recognition and Machine Learning — Christopher M. Bishop
- The Elements of Statistical Learning (2nd ed.) — Hastie, Tibshirani & Friedman
- Reinforcement Learning: A Survey — Kaelbling, Littman & Moore (JAIR)
- scikit-learn — Supervised learning — scikit-learn developers
- scikit-learn — Unsupervised learning — scikit-learn developers
- ISO/IEC 22989 — AI concepts and terminology — ISO/IEC
- Machine Learning Crash Course — Google
This is an educational explainer covering established machine-learning concepts. The example tasks in the interactive are illustrative teaching cases, not claims about any specific commercial product. Definitions follow the cited primary sources; where a topic is genuinely contested or evolving (for example, where self-supervised and semi-supervised learning sit relative to the classic three-way split), we say so in the text.
AI systems can produce plausible-sounding but incorrect guidance. For decisions with real-world stakes, verify against the primary sources linked above and consult a qualified professional. This lesson is a self-assessment study aid, not a professional certification.
⊕Concept map
The whole lesson on one screen: three learning paradigms, separated by the feedback signal each one learns from. Expand a branch to review the essentials.
Three families, decided by one thing
- The three paradigms are different learning settings, separated by the feedback signal each learns from.
- One question sorts them: what does it learn from — labels, raw structure, or a reward?
- The boundaries can blur: semi-supervised and self-supervised learning don’t fit cleanly into the three-way split.
Supervised — learning from labeled answers
- Trains on labeled examples — input–output pairs — learning a mapping to known target outputs.
- Two task types: classification (discrete labels) and regression (continuous values).
- Goal is a mapping that generalizes, so models are tested on held-out data they didn’t train on.
- Example methods: linear/logistic regression, SVMs, decision trees, neural-network classifiers.
Unsupervised — finding structure with no labels
- Works on unlabeled data and must discover structure itself — no answer key is given.
- Two big families: clustering (grouping by similarity) and dimensionality reduction / density estimation.
- Example methods: k-means and DBSCAN clustering, and PCA for dimensionality reduction.
- Self-supervised learning, which underpins modern LLMs, is usually described separately from this classic setting.
Reinforcement — learning by reward and trial
- No answer key: an agent acts in an environment and learns from a scalar reward earned through trial and error.
- Learns a policy — a strategy mapping situations to actions — tuned to maximize cumulative reward.
- Exploration vs exploitation: balancing new actions to learn against known good actions to score.
- Example methods: Q-learning, deep Q-networks (DQN), and policy-gradient methods.
Sort the task: which paradigm fits?
- To classify a task, ask whether it comes with labeled answers, just raw data to find structure in, or a reward earned by acting.
- Labels → supervised; structure → unsupervised; reward → reinforcement.
- The signal a paradigm learns from is the fastest way to decide where a task belongs.
→Related lessons
Supervised vs unsupervised vs reinforcement learning — in 5 minutes
Tech Jacks Solutions · AI Knowledge Hub · educational summary
One question decides the family
Ask "what does the model learn from?" Supervised learns from labels, unsupervised from structure in unlabeled data, reinforcement from a reward earned by acting.
Supervised learning
Trains on labeled examples (input-output pairs) to learn a mapping to known answers. Two task types: classification (discrete labels) and regression (continuous values). Tested on held-out data so it generalizes.
Unsupervised learning
Works on unlabeled data and discovers structure: clustering (grouping by similarity) and dimensionality reduction (compressing data). No answer key, so success is harder to measure.
Reinforcement learning
An agent takes actions in an environment and learns a policy by maximizing cumulative reward through trial and error. Key tension: exploration vs exploitation. Methods: Q-learning, DQN, policy gradients.
The blur
Semi-supervised and self-supervised learning mix labeled and unlabeled data and don't fit cleanly into the three-way split.