Why "think step by step" makes models smarter
Ask a model a hard question and it often blurts out a confident, wrong answer. Ask it to show its working first and the answer frequently gets better. That is chain-of-thought prompting. This lesson shows you why intermediate steps help, watch one being built live, then meet self-consistency and the wider family of reasoning techniques.
01What chain-of-thought prompting is
The AI Governance Charter — establish ownership, scope, and accountability for AI.
Get the charter Browse all templatesYour purchase helps keep our hubs free to read.
Chain-of-thought (CoT) prompting asks a language model to produce a series of intermediate reasoning steps before giving its final answer, rather than jumping straight to the answer (Wei et al., 2022). The idea is simple but powerful: a tricky multi-step problem is far easier to get right if you work through it one piece at a time — and the same turns out to be true for the model. Instead of compressing all the reasoning into a single leap, the model "shows its working," and the act of writing out those steps tends to make the final answer more reliable on problems that need arithmetic, commonsense, or symbolic reasoning.
An important caveat from the original research: this benefit was reported to emerge with model scale — smaller models of that era gained little, and could even be hurt, while sufficiently large models improved markedly (Wei et al., 2022). Treat that as a finding about the models of its time, not a universal law; today's instruction-tuned and reasoning-trained models behave differently.
- CoT = intermediate reasoning steps first, final answer last, instead of answering directly.
- It helps most on multi-step problems — arithmetic, logic, commonsense chains.
- The original "emerges with scale" finding is tied to 2022-era models; don't overgeneralize the threshold (Wei et al., 2022).
02Two ways to ask: few-shot vs. zero-shot
There are two main ways to elicit a chain of thought, and they differ in how much you have to write into the prompt.
You include a few worked examples (exemplars) in the prompt, each one showing a question and the step-by-step reasoning that leads to its answer. The model imitates that pattern on your new question (Wei et al., 2022).
Q: [your question] A:
No exemplars at all. You simply append a trigger phrase such as Let's think step by step, and the model produces step-by-step reasoning on its own (Kojima et al., 2022).
Let's think step by step.
A later refinement, Auto-CoT, removes the hand-written exemplars entirely: it clusters questions for diversity and uses the zero-shot trigger to generate reasoning chains automatically, then reuses those as demonstrations (Zhang et al., 2022). Vendor guidance describes a similar ladder — from a plain "think step-by-step," to a guided outline of the steps, to structured tags that separate the reasoning from the final answer (Anthropic docs).
03See it work: direct vs. chain-of-thought vs. self-consistency
Here is one word problem solved three ways. Direct answer jumps straight to a guess — and gets it wrong. Chain-of-thought builds the steps one at a time and lands on the right number. Self-consistency samples several independent reasoning chains and takes the majority vote over their final answers, so one bad chain doesn't decide the outcome (Wang et al., 2022). Pick a mode and run it. The walkthrough below is an illustration of the technique, not output from any specific model.
- Direct compresses everything into one leap — easy to drop a step and answer confidently but wrong.
- Chain-of-thought writes the intermediate steps, so each small calculation can be right before the next.
- Self-consistency samples multiple chains and takes the majority answer, smoothing out one-off mistakes (Wang et al., 2022).
04Why sampling many chains helps: self-consistency & supervision
A single chain of thought can still go wrong — one slip early on and the whole answer is off. Self-consistency addresses this by sampling several diverse reasoning paths for the same prompt and choosing the final answer by majority vote, replacing the usual single greedy path (Wang et al., 2022). Different chains may take different routes, but correct reasoning tends to converge on the same answer, so the majority is more often right than any one chain.
A related thread looks at how we reward reasoning. Process supervision gives feedback on each intermediate step, while outcome supervision rewards only the final answer; rewarding the steps was found to outperform rewarding only the outcome on the MATH dataset (Lightman et al., 2023). And STaR shows models can bootstrap their own reasoning — generate rationales, keep the ones that reach correct answers, and fine-tune on them (Zelikman et al., 2022). These ideas point toward a bigger shift: modern reasoning models (such as OpenAI's o1) produce an internal chain of thought learned via reinforcement learning, rather than relying only on a clever prompt (OpenAI, 2024).
- Self-consistency: sample multiple chains, take the majority-vote answer (Wang et al., 2022).
- Process vs. outcome supervision: rewarding each step beat rewarding only the final answer on MATH (Lightman et al., 2023).
- Prompted CoT ≠ trained reasoning: reasoning models learn an internal chain of thought via RL, a different mechanism from prompt-elicited CoT (OpenAI, 2024).
05Beyond a single chain: the reasoning family
Chain-of-thought is the starting point, not the whole story. Several techniques extend the idea — by decomposing the problem, branching the search, or letting the model act and critique itself. Switch between them to see the core idea of each.
Least-to-most prompting
Decompose a complex problem into a sequence of simpler subproblems and solve them in order, feeding each earlier answer into the next step (Zhou et al., 2022). Where CoT reasons in one pass, least-to-most explicitly stages the problem from easiest to hardest.
Tree of Thoughts
Explore a tree of intermediate reasoning states, letting the model self-evaluate options and look ahead or backtrack — generalizing the single linear chain of CoT into a search (Yao et al., 2023).
Graph of Thoughts
Model reasoning units as vertices in an arbitrary graph with dependency edges, so thoughts can be aggregated and refined through feedback loops — beyond the strict chain or tree (Besta et al., 2023).
ReAct: reason + act
Interleave reasoning traces with actions such as tool or API calls, so the model can gather external information and update its plan — which can reduce hallucination compared with reasoning alone (Yao et al., 2022).
Self-Refine & Reflexion
Self-Refine has one model generate an output, critique it, and revise iteratively from its own feedback, with no extra training (Madaan et al., 2023). Reflexion goes further: an agent verbally reflects on task feedback and stores those reflections in episodic memory to improve on later attempts, again without updating weights (Shinn et al., 2023).
A note on faithfulness. The visible "thinking" or "reasoning" a model shows you is a controlled output or summary — it is not guaranteed to be a complete, literal record of the computation inside the model. Treat exposed reasoning as a useful aid, not proof of exactly how the answer was reached.
06Knowledge check
07Take it with you & continue learning
⊕Concept map
Expand each branch to see how the ideas in this lesson connect — from what chain-of-thought is, to how to prompt it, to the wider reasoning family.
What chain-of-thought prompting is
- Elicits intermediate reasoning steps before the final answer, instead of answering directly (Wei et al., 2022).
- Helps most on multi-step problems — arithmetic, commonsense, and symbolic reasoning chains.
- In the original research the benefit emerged with model scale; smaller models gained little or were hurt (Wei et al., 2022).
Two ways to ask: few-shot vs. zero-shot
- Few-shot CoT includes worked exemplars that each show a reasoning chain for the model to imitate (Wei et al., 2022).
- Zero-shot CoT appends a trigger phrase such as “Let’s think step by step” with no exemplars (Kojima et al., 2022).
- Auto-CoT clusters questions and uses the zero-shot trigger to build demonstrations automatically, removing hand-crafted exemplars (Zhang et al., 2022).
Direct vs. chain-of-thought vs. self-consistency
- Direct answering compresses everything into one leap — easy to drop a step and be confidently wrong.
- Chain-of-thought writes the intermediate steps so each small calculation can be right before the next.
- Self-consistency samples several diverse chains and takes a majority vote, so one bad chain doesn’t decide the outcome (Wang et al., 2022).
Why sampling many chains helps: supervision
- Process supervision rewards each intermediate step; outcome supervision rewards only the final answer — rewarding steps beat outcome-only on MATH (Lightman et al., 2023).
- STaR bootstraps reasoning: generate rationales, keep those that reach correct answers, and fine-tune on them (Zelikman et al., 2022).
- Reasoning models such as OpenAI’s o1 learn an internal chain of thought via reinforcement learning — distinct from prompt-elicited CoT (OpenAI, 2024).
Beyond a single chain: the reasoning family
- Least-to-most decomposes a problem into ordered subproblems, reusing each earlier answer in the next (Zhou et al., 2022).
- Tree of Thoughts and Graph of Thoughts generalize the linear chain into a search with branching, scoring, and feedback loops (Yao et al., 2023; Besta et al., 2023).
- ReAct interleaves reasoning with actions (tool calls); Self-Refine and Reflexion add self-critique and remembered feedback (Yao et al., 2022; Madaan et al., 2023; Shinn et al., 2023).
→Related lessons
Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts grounded in the primary references below; the worked example in the interactive is illustrative, not output from any specific model. Performance figures in the source papers are tied to specific models and datasets, and provider docs change quickly — treat the linked sources as the live source of truth.
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — Wei et al. (Google Research)
- Large Language Models are Zero-Shot Reasoners — Kojima et al.
- Self-Consistency Improves Chain of Thought Reasoning in Language Models — Wang et al. (Google Research)
- Least-to-Most Prompting Enables Complex Reasoning in LLMs — Zhou et al. (Google Research)
- Automatic Chain of Thought Prompting (Auto-CoT) — Zhang et al. (Amazon)
- ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al.
- Tree of Thoughts: Deliberate Problem Solving with LLMs — Yao et al.
- Graph of Thoughts: Solving Elaborate Problems with LLMs — Besta et al. (ETH Zurich)
- Self-Refine: Iterative Refinement with Self-Feedback — Madaan et al.
- Reflexion: Language Agents with Verbal Reinforcement Learning — Shinn et al.
- Let's Verify Step by Step — Lightman et al. (OpenAI)
- STaR: Bootstrapping Reasoning With Reasoning — Zelikman et al.
- Learning to Reason with LLMs (o1) — OpenAI
- Chain-of-thought prompting (Let Claude think) — Anthropic
- Gemini thinking — Google
Chain-of-thought & reasoning prompting — in 9 minutes
Tech Jacks Solutions · AI Knowledge Hub · educational summary
What it is
Chain-of-thought (CoT) prompting asks a model to produce intermediate reasoning steps before the final answer, instead of answering directly. It helps most on multi-step problems. In 2022-era research the benefit emerged with model scale (Wei et al., 2022).
Few-shot vs. zero-shot
Few-shot CoT includes worked examples that each show a reasoning chain. Zero-shot CoT just appends a trigger like "Let's think step by step" (Kojima et al., 2022). Auto-CoT builds the demonstrations automatically (Zhang et al., 2022).
Self-consistency & supervision
Self-consistency samples multiple diverse chains and takes a majority vote over the answers (Wang et al., 2022). Process supervision rewards each step and beat outcome-only supervision on MATH (Lightman et al., 2023). Reasoning models (o1) learn an internal chain of thought via RL — related to, but not the same as, prompted CoT (OpenAI, 2024).
The reasoning family
Least-to-most decomposes into ordered subproblems (Zhou et al., 2022); Tree of Thoughts searches a tree of states (Yao et al., 2023); Graph of Thoughts uses a graph (Besta et al., 2023); ReAct interleaves reasoning with actions (Yao et al., 2022); Self-Refine and Reflexion add self-critique loops (Madaan et al., 2023; Shinn et al., 2023).
One caution
Visible "thinking" is a controlled output or summary, not a guaranteed faithful record of the model's internal computation.