Supervised vs Unsupervised vs Reinforcement Learning
Three words cover most machine learning. Here is what each one actually means, what problems each solves, and how to recognise which family a new algorithm belongs to.
The shape of each category
The three paradigms differ in one thing: what kind of feedback the model gets during training.
- Supervised: the model sees both input and the correct output. It learns to map one to the other.
- Unsupervised: the model sees only inputs. It learns to describe the structure it finds.
- Reinforcement: the model takes actions in an environment and receives a reward signal. It learns a policy that maximises long-term reward.
Almost every algorithm you’ll read about in the next year falls into one of these three buckets, or a hybrid.
Supervised learning: learn from labels
The classical setting. You have a dataset of (input, correct-output) pairs. The model’s job is to learn a function that maps inputs to outputs, so that on new unseen inputs it predicts correctly.
Two sub-flavours depending on what the output looks like:
- Classification: output is a category. Spam vs not-spam, dog vs cat vs bird, fraud vs legitimate.
- Regression: output is a continuous number. House price, expected click-through-rate, tomorrow’s temperature.
Nearly every production ML model you interact with daily (search ranking, recommendations, fraud detection, autocomplete) is a supervised model, often with hundreds of millions of labelled examples behind it.
Unsupervised learning: find structure
No labels. The model is given raw data and asked to describe it. The two main tasks are:
- Clustering: group similar items together. Customer segments, document topics, patient cohorts. The algorithm doesn’t know what the groups should be; it proposes them.
- Dimensionality reduction: take data with many features and describe each item with fewer. Useful for visualisation (compress 1,000-dimensional data to 2 dimensions so you can plot it) and as a preprocessing step before supervised learning.
Unsupervised is often the first step in a data-science workflow. You cluster customers to understand segments, then build a supervised model for each segment.
Reinforcement learning: learn from consequences
An agent acts in an environment, observes a reward, and updates its policy (a function from state to action) to get more reward over time. RL is the right fit when:
- You can’t hand-label “correct” outputs, but you can score outcomes (a game won/lost, a robot arm reaching/missing, a recommended route’s actual travel time).
- Decisions are sequential: what you do now affects what options you have later.
RL is what trained AlphaGo, AlphaFold’s folding search, robot locomotion, and the reinforcement-learning-from-human-feedback step in modern chatbots. It is more sample-hungry and harder to stabilise than supervised learning, which is why most production systems use it as the final polish rather than the main engine.
Self-supervised learning: the newcomer
A fourth paradigm worth knowing because it powers every large language model you’ve used. The trick is simple: take unlabelled data and invent a supervised task from it.
For text, the invented task is “given these words, predict the next word.” The label is free because the next word is right there in the training sentence. That unlocks internet-scale training without a human labelling anything.
Self-supervised techniques are what took ML from “needs millions of labelled examples per task” to “train once on the whole internet, then fine-tune for any task.”
Which family for which problem
| Your situation | Family to start with |
|---|---|
| You have labelled data | Supervised |
| You have data but no labels | Unsupervised |
| You have a game or simulator | Reinforcement |
| You have oceans of unlabelled text/images | Self-supervised pretraining, then fine-tune supervised |
Which one to start with as a beginner
Supervised. Without question. Three reasons:
- It’s the most used. Job postings are 80% supervised-learning work. Learning it first maximises your job-market relevance.
- Debugging is easiest. Because you have correct labels, you can directly measure how wrong your model is. Unsupervised and reinforcement settings both have much more ambiguous success criteria.
- The concepts transfer. Cross-entropy loss, gradient descent, overfitting, cross-validation, all of the supervised foundation shows up in the other families. Learning them in the simpler setting makes the advanced settings readable.
Start with a supervised classifier on a well-known dataset. The MNIST digit-classification dataset or the Titanic survival dataset are both beginner-friendly, have strong online tutorials, and teach the full workflow in an afternoon.