Language lesson

Track 02 · Language Intermediate ~9 min

Why "think step by step" makes models smarter

Ask a model a hard question and it often blurts out a confident, wrong answer. Ask it to show its working first and the answer frequently gets better. That is chain-of-thought prompting. This lesson shows you why intermediate steps help, watch one being built live, then meet self-consistency and the wider family of reasoning techniques.

Module progress

01What chain-of-thought prompting is

Chain-of-thought (CoT) prompting asks a language model to produce a series of intermediate reasoning steps before giving its final answer, rather than jumping straight to the answer (Wei et al., 2022). The idea is simple but powerful: a tricky multi-step problem is far easier to get right if you work through it one piece at a time — and the same turns out to be true for the model. Instead of compressing all the reasoning into a single leap, the model "shows its working," and the act of writing out those steps tends to make the final answer more reliable on problems that need arithmetic, commonsense, or symbolic reasoning.

An important caveat from the original research: this benefit was reported to emerge with model scale — smaller models of that era gained little, and could even be hurt, while sufficiently large models improved markedly (Wei et al., 2022). Treat that as a finding about the models of its time, not a universal law; today's instruction-tuned and reasoning-trained models behave differently.

CoT = intermediate reasoning steps first, final answer last, instead of answering directly.
It helps most on multi-step problems — arithmetic, logic, commonsense chains.
The original "emerges with scale" finding is tied to 2022-era models; don't overgeneralize the threshold (Wei et al., 2022).

02Two ways to ask: few-shot vs. zero-shot

There are two main ways to elicit a chain of thought, and they differ in how much you have to write into the prompt.

Few-shot CoT

You include a few worked examples (exemplars) in the prompt, each one showing a question and the step-by-step reasoning that leads to its answer. The model imitates that pattern on your new question (Wei et al., 2022).

prompt: Q: [example] A: step 1… step 2… so the answer is X.
Q: [your question] A:

Zero-shot CoT

No exemplars at all. You simply append a trigger phrase such as Let's think step by step, and the model produces step-by-step reasoning on its own (Kojima et al., 2022).

prompt: Q: [your question]
Let's think step by step.

A later refinement, Auto-CoT, removes the hand-written exemplars entirely: it clusters questions for diversity and uses the zero-shot trigger to generate reasoning chains automatically, then reuses those as demonstrations (Zhang et al., 2022). Vendor guidance describes a similar ladder — from a plain "think step-by-step," to a guided outline of the steps, to structured tags that separate the reasoning from the final answer (Anthropic docs).

03See it work: direct vs. chain-of-thought vs. self-consistency

Here is one word problem solved three ways. Direct answer jumps straight to a guess — and gets it wrong. Chain-of-thought builds the steps one at a time and lands on the right number. Self-consistency samples several independent reasoning chains and takes the majority vote over their final answers, so one bad chain doesn't decide the outcome (Wang et al., 2022). Pick a mode and run it. The walkthrough below is an illustration of the technique, not output from any specific model.

InteractivePick a mode, then Run

Problem. A cafe had 23 muffins. They baked 4 trays of 6 more, then sold 17 over the morning. How many muffins are left?

The model's response

What's happening

Direct compresses everything into one leap — easy to drop a step and answer confidently but wrong.
Chain-of-thought writes the intermediate steps, so each small calculation can be right before the next.
Self-consistency samples multiple chains and takes the majority answer, smoothing out one-off mistakes (Wang et al., 2022).

04Why sampling many chains helps: self-consistency & supervision

A single chain of thought can still go wrong — one slip early on and the whole answer is off. Self-consistency addresses this by sampling several diverse reasoning paths for the same prompt and choosing the final answer by majority vote, replacing the usual single greedy path (Wang et al., 2022). Different chains may take different routes, but correct reasoning tends to converge on the same answer, so the majority is more often right than any one chain.

A related thread looks at how we reward reasoning. Process supervision gives feedback on each intermediate step, while outcome supervision rewards only the final answer; rewarding the steps was found to outperform rewarding only the outcome on the MATH dataset (Lightman et al., 2023). And STaR shows models can bootstrap their own reasoning — generate rationales, keep the ones that reach correct answers, and fine-tune on them (Zelikman et al., 2022). These ideas point toward a bigger shift: modern reasoning models (such as OpenAI's o1) produce an internal chain of thought learned via reinforcement learning, rather than relying only on a clever prompt (OpenAI, 2024).

Self-consistency: sample multiple chains, take the majority-vote answer (Wang et al., 2022).
Process vs. outcome supervision: rewarding each step beat rewarding only the final answer on MATH (Lightman et al., 2023).
Prompted CoT ≠ trained reasoning: reasoning models learn an internal chain of thought via RL, a different mechanism from prompt-elicited CoT (OpenAI, 2024).

05Beyond a single chain: the reasoning family

Chain-of-thought is the starting point, not the whole story. Several techniques extend the idea — by decomposing the problem, branching the search, or letting the model act and critique itself. Switch between them to see the core idea of each.

InteractiveSwitch the technique

Least-to-most prompting

Decompose a complex problem into a sequence of simpler subproblems and solve them in order, feeding each earlier answer into the next step (Zhou et al., 2022). Where CoT reasons in one pass, least-to-most explicitly stages the problem from easiest to hardest.

idea: break it down → solve sub-problem 1 → reuse its answer in sub-problem 2 → …

Tree of Thoughts

Explore a tree of intermediate reasoning states, letting the model self-evaluate options and look ahead or backtrack — generalizing the single linear chain of CoT into a search (Yao et al., 2023).

idea: branch into candidate thoughts → score them → expand the promising ones, abandon dead ends

Graph of Thoughts

Model reasoning units as vertices in an arbitrary graph with dependency edges, so thoughts can be aggregated and refined through feedback loops — beyond the strict chain or tree (Besta et al., 2023).

idea: thoughts as nodes → combine, merge and loop back, not just branch forward

ReAct: reason + act

Interleave reasoning traces with actions such as tool or API calls, so the model can gather external information and update its plan — which can reduce hallucination compared with reasoning alone (Yao et al., 2022).

idea: thought → action (search/tool) → observation → thought → … → answer

Self-Refine & Reflexion

Self-Refine has one model generate an output, critique it, and revise iteratively from its own feedback, with no extra training (Madaan et al., 2023). Reflexion goes further: an agent verbally reflects on task feedback and stores those reflections in episodic memory to improve on later attempts, again without updating weights (Shinn et al., 2023).

idea: draft → self-critique → revise; remember what went wrong for next time

A note on faithfulness. The visible "thinking" or "reasoning" a model shows you is a controlled output or summary — it is not guaranteed to be a complete, literal record of the computation inside the model. Treat exposed reasoning as a useful aid, not proof of exactly how the answer was reached.

06Knowledge check

TJS Quiz

07Take it with you & continue learning

"Chain-of-thought & reasoning prompting" — one-page summary

The whole lesson distilled to a printable cheat-sheet.

▸ Already on the site — go deeper

⊕Concept map

Expand each branch to see how the ideas in this lesson connect — from what chain-of-thought is, to how to prompt it, to the wider reasoning family.

What chain-of-thought prompting is

Elicits intermediate reasoning steps before the final answer, instead of answering directly (Wei et al., 2022).
Helps most on multi-step problems — arithmetic, commonsense, and symbolic reasoning chains.
In the original research the benefit emerged with model scale; smaller models gained little or were hurt (Wei et al., 2022).

Two ways to ask: few-shot vs. zero-shot

Few-shot CoT includes worked exemplars that each show a reasoning chain for the model to imitate (Wei et al., 2022).
Zero-shot CoT appends a trigger phrase such as “Let’s think step by step” with no exemplars (Kojima et al., 2022).
Auto-CoT clusters questions and uses the zero-shot trigger to build demonstrations automatically, removing hand-crafted exemplars (Zhang et al., 2022).

Direct vs. chain-of-thought vs. self-consistency

Direct answering compresses everything into one leap — easy to drop a step and be confidently wrong.
Chain-of-thought writes the intermediate steps so each small calculation can be right before the next.
Self-consistency samples several diverse chains and takes a majority vote, so one bad chain doesn’t decide the outcome (Wang et al., 2022).

Why sampling many chains helps: supervision

Process supervision rewards each intermediate step; outcome supervision rewards only the final answer — rewarding steps beat outcome-only on MATH (Lightman et al., 2023).
STaR bootstraps reasoning: generate rationales, keep those that reach correct answers, and fine-tune on them (Zelikman et al., 2022).
Reasoning models such as OpenAI’s o1 learn an internal chain of thought via reinforcement learning — distinct from prompt-elicited CoT (OpenAI, 2024).

Beyond a single chain: the reasoning family

Least-to-most decomposes a problem into ordered subproblems, reusing each earlier answer in the next (Zhou et al., 2022).
Tree of Thoughts and Graph of Thoughts generalize the linear chain into a search with branching, scoring, and feedback loops (Yao et al., 2023; Besta et al., 2023).
ReAct interleaves reasoning with actions (tool calls); Self-Refine and Reflexion add self-critique and remembered feedback (Yao et al., 2022; Madaan et al., 2023; Shinn et al., 2023).

Sources & further reading

Published by Tech Jacks Solutions · Reviewed June 2026. This lesson explains established concepts grounded in the primary references below; the worked example in the interactive is illustrative, not output from any specific model. Performance figures in the source papers are tied to specific models and datasets, and provider docs change quickly — treat the linked sources as the live source of truth.

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models — Wei et al. (Google Research)
Large Language Models are Zero-Shot Reasoners — Kojima et al.
Self-Consistency Improves Chain of Thought Reasoning in Language Models — Wang et al. (Google Research)
Least-to-Most Prompting Enables Complex Reasoning in LLMs — Zhou et al. (Google Research)
Automatic Chain of Thought Prompting (Auto-CoT) — Zhang et al. (Amazon)
ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al.
Tree of Thoughts: Deliberate Problem Solving with LLMs — Yao et al.
Graph of Thoughts: Solving Elaborate Problems with LLMs — Besta et al. (ETH Zurich)
Self-Refine: Iterative Refinement with Self-Feedback — Madaan et al.
Reflexion: Language Agents with Verbal Reinforcement Learning — Shinn et al.
Let's Verify Step by Step — Lightman et al. (OpenAI)
STaR: Bootstrapping Reasoning With Reasoning — Zelikman et al.
Learning to Reason with LLMs (o1) — OpenAI
Chain-of-thought prompting (Let Claude think) — Anthropic
Gemini thinking — Google

Educational use only. This lesson is a conceptual introduction to chain-of-thought and reasoning prompting. The worked problem, reasoning steps, and votes in the interactive are illustrative mock-ups, not output from any specific model. Reported results in the cited papers depend on particular models, datasets, and dates, and provider APIs change frequently; always verify details against the official documentation linked above before relying on them. Nothing here is professional engineering, legal, or medical advice.

Chain-of-thought & reasoning prompting — in 9 minutes

Tech Jacks Solutions · AI Knowledge Hub · educational summary

What it is

Chain-of-thought (CoT) prompting asks a model to produce intermediate reasoning steps before the final answer, instead of answering directly. It helps most on multi-step problems. In 2022-era research the benefit emerged with model scale (Wei et al., 2022).

Few-shot vs. zero-shot

Few-shot CoT includes worked examples that each show a reasoning chain. Zero-shot CoT just appends a trigger like "Let's think step by step" (Kojima et al., 2022). Auto-CoT builds the demonstrations automatically (Zhang et al., 2022).

Self-consistency & supervision

Self-consistency samples multiple diverse chains and takes a majority vote over the answers (Wang et al., 2022). Process supervision rewards each step and beat outcome-only supervision on MATH (Lightman et al., 2023). Reasoning models (o1) learn an internal chain of thought via RL — related to, but not the same as, prompted CoT (OpenAI, 2024).

The reasoning family

Least-to-most decomposes into ordered subproblems (Zhou et al., 2022); Tree of Thoughts searches a tree of states (Yao et al., 2023); Graph of Thoughts uses a graph (Besta et al., 2023); ReAct interleaves reasoning with actions (Yao et al., 2022); Self-Refine and Reflexion add self-critique loops (Madaan et al., 2023; Shinn et al., 2023).

One caution

Visible "thinking" is a controlled output or summary, not a guaranteed faithful record of the model's internal computation.

Gallery

Contacts

Why "think step by step" makes models smarter

01What chain-of-thought prompting is

02Two ways to ask: few-shot vs. zero-shot

03See it work: direct vs. chain-of-thought vs. self-consistency

04Why sampling many chains helps: self-consistency & supervision