What Is a Neural Network? An Intuitive Explanation
Forget the brain analogy. A neural network is a mathematical function with a lot of knobs and a procedure for turning those knobs until the function does what you want. That’s it.
Why the brain analogy misleads
Every popular-science introduction starts with “a neural network is modelled on the brain.” That’s mostly wrong, and it leads beginners in unhelpful directions.
A neural network was originally inspired by a very simplified model of a neuron from 1943. The simplification was so extreme that cognitive scientists don’t consider modern networks to have meaningful connection to how biological brains work. The math behind a transformer has more in common with statistics and linear algebra than with neuroscience.
So forget the brain. You don’t need biology to understand neural networks. You need arithmetic.
A function with many knobs
A neural network is a mathematical function. It takes numbers as input, runs them through a series of multiplications and simple non-linear operations, and outputs more numbers.
What makes it a learning function is that the multiplications are governed by adjustable knobs, called weights. A small network might have 1,000 weights. A modern language model has hundreds of billions. During training, an optimiser nudges every weight, little by little, to make the output match the training targets.
The whole technology is: (1) a function shape flexible enough to express any pattern, (2) a procedure (gradient descent) for finding good weights. Nothing else. The “learning” is just that procedure running at scale.
Layers, neurons, weights, and all those words
The function is built up in layers. Each layer takes the previous layer’s output, applies a matrix multiplication, adds a bias, then applies a non-linear activation. The layers stack. The output of the last layer is the model’s prediction.
- Neuron: one slot in one layer. It has a bias and a row of weights.
- Weight: the knob. One weight per (input feature, output neuron) pair.
- Layer: a collection of neurons that operate on the same input.
- Activation: a simple non-linear function (ReLU, tanh, sigmoid). Without these, stacking layers would mathematically collapse into one layer.
A feed-forward network is a stack of fully-connected layers. A convolutional network has layers that share weights across spatial positions. A transformer has layers that include attention operations. They’re all variations on the same theme.
How the network actually learns
The learning process has three repeating steps, called an epoch:
- Forward pass: feed a batch of inputs through the network, get predictions.
- Compute loss: a single number describing how wrong the predictions were.
- Backward pass (backpropagation): compute, for every weight, how much a small change in that weight would reduce the loss. Nudge each weight in that direction by a tiny amount (the learning rate).
Repeat for thousands or millions of batches. Over time, the loss drops. The weights settle into values that, collectively, make the network compute a useful function. The “magic” is that this simple loop, scaled enough, produces networks that translate languages, play Go, and generate code.
A simple concrete example
Suppose you want a network that classifies 28x28 greyscale handwritten digits (MNIST). The input is 784 numbers (one per pixel). The output is 10 numbers (one per digit 0-9), where the biggest number indicates the predicted digit.
A minimal network:
- Input layer: 784 numbers.
- Hidden layer: 128 neurons, ReLU activation. This has 784 × 128 = 100,352 weights, plus 128 biases.
- Output layer: 10 neurons, softmax activation. This has 128 × 10 = 1,280 weights, plus 10 biases.
Total: about 102,000 weights to tune. Train for a few minutes on a laptop and you reach 98% accuracy. No hand-written rules about what digits look like. The network invents its own internal representation of “what makes an 8 different from a 3.”
When do networks beat other methods?
Networks aren’t universally best. For small tabular data (spreadsheets with thousands of rows and tens of columns), gradient-boosted trees almost always beat neural networks.
Networks dominate when:
- The input is raw and high-dimensional: images, audio, text, video.
- There’s a lot of data: hundreds of thousands of examples minimum, ideally millions.
- You have GPUs: training a network on a CPU is painful. GPUs speed it up by 10–100×.
If you have 1,000 rows of CSV data, use XGBoost. If you have a million images, use a neural network.
What “deep” learning actually means
Deep learning is just “neural networks with lots of layers.” There’s no bright-line definition, but anything over 5-10 layers usually qualifies. The word “deep” became marketing, then became the default.
The reason depth matters: each layer can learn a feature that builds on the features from the layer below. Layer 1 might learn edges; layer 2 learns shapes; layer 3 learns objects. A deep network can represent complex hierarchies that a shallow one can’t.
Modern language models are typically 60-120 transformer layers deep. Each layer does a small refinement on the representation of the text passing through. Stacked, they can reason about paragraphs, translate languages, and explain code.