How do neural networks work?
Loosely inspired by the brain, a neural network is just layers of simple units that multiply, add, and decide. Learn the single neuron, how layers stack into "deep" learning, how a prediction flows through, and the network families behind modern AI — right here on the page.
01The shape of a neural network
Picture an assembly line where information goes in one end, gets handed from station to station, and comes out the other end as an answer. A neural network works like that: it's built from rows of small workers called neurons, each doing one tiny bit of arithmetic and passing the result on. Those rows are called layers — data enters the input layer, flows through one or more hidden layers, and leaves through the output layer as a prediction. When a network has many hidden layers, we call it "deep" learning — that depth is what lets it learn complex patterns. Toggle the diagram to see a shallow network grow deeper.
A shallow network: 3 inputs → one hidden layer of 4 neurons → 1 output. Every line is a connection carrying a weight.
- Neurons are organized in layers: an input layer, one or more hidden layers, and an output layer.
- Each line between nodes is a connection with its own learned weight — that's where the network stores what it knows.
- "Deep" learning simply means many hidden layers — more depth, more capacity to model complex relationships.
02Inside a single neuron
Every network is built from one repeated unit. A neuron takes inputs, multiplies each by a weight, adds a bias, sums them, then passes that sum through an activation function to produce its output. Tap each part to see what it does.
Neuron
A neuron is the network's smallest building block. It receives one or more inputs, does a small calculation — multiply each input by a weight, add them up with a bias, then apply an activation function — and emits a single output number that feeds the next layer.
Edge thickness = weight size · blue = positive, red = negative. The neuron only "fires" a positive value (ReLU zeroes negatives).
03The forward pass: how a prediction is made
Making a prediction is called the forward pass: data flows from the input layer, through each layer's weighted sums and activations, to the output. Step through one neuron computing its output, then the signal moving to the next layer and out as a prediction. (Numbers shown are illustrative, to make the arithmetic concrete.)
x₁, x₂, x₃ describing one example.z = (x₁·w₁ + x₂·w₂ + x₃·w₃) + b.z passes through an activation function (e.g., ReLU) to produce the neuron's output a — adding the non-linearity that lets networks model curves, not just straight lines.- The forward pass only uses the weights — it doesn't change them. Learning happens separately, during training.
- Without activation functions, stacking layers would collapse to a single straight-line model — the non-linearity is what makes depth useful.
04How a network learns: training in one breath
A fresh network's weights start as essentially random, so its first predictions are wrong. Training fixes that. The network makes a prediction, that prediction is compared to the correct answer to measure the error — a number called the loss. Then backpropagation works backward through the layers to figure out how much each weight contributed to the loss, and gradient descent nudges every weight a small step in the direction that reduces the error. Repeat this over many examples, many times, and the weights gradually settle into values that make good predictions. That repeated "predict → measure error → nudge weights" loop is the whole of how neural networks learn.
05The main families of network
"Neural network" is an umbrella term. Different architectures wire neurons together in different ways to suit different data. The four you'll meet most:
Feedforward — the basic network
The simplest design: data moves in one direction only, input → hidden → output, with no loops. It's the forward pass you just stepped through. Great for straightforward prediction and classification on fixed-size inputs, and the foundation every other architecture builds on.
CNN — convolutional, great for images
A Convolutional Neural Network slides small filters across an image to detect local patterns — edges, then textures, then shapes — building up to whole objects. Because it reuses the same filters everywhere, it's efficient and excels at vision tasks.
RNN — recurrent, built for sequences
A Recurrent Neural Network processes data in order and keeps a running internal state, so earlier items influence later ones. That memory of position makes it suited to sequences and time — text, audio, or any series of steps.
Transformer — attention, powers modern LLMs
A Transformer uses a mechanism called "attention" to weigh how much every part of the input relates to every other part — all at once, rather than strictly in sequence. That ability to model long-range relationships is what powers modern large language models.