Machine learning explained simply: supervised vs unsupervised learning
Supervised vs Unsupervised Learning
Machine learning is often talked about as if it’s mysterious or magical. In reality, it’s a set of methods that allow computers to learn patterns from data and use those patterns to make predictions, group data, or highlight unusual behaviour.
At its core, machine learning answers a simple question:
Can a computer learn something useful from data without being explicitly told every rule?
What is machine learning?
Machine learning (ML) is a branch of data science in which algorithms learn from data rather than following fixed, hand-written rules.
Rather than telling a computer:
“If X happens, then do Y”
we show it many examples and let it learn the relationship itself.
This approach is especially useful when:
Data is complex or high-dimensional
Patterns are subtle
Rules are hard to define in advance
This is why machine learning is widely used in areas such as genomics, imaging, finance, and recommendation systems.
The two main types of machine learning
Most machine learning methods fall into two broad categories:
Supervised learning
Unsupervised learning
The key difference is whether the data comes with known answers.
Supervised learning: learning with labels
In supervised learning, the algorithm is trained on data where the correct answer is already known. These known answers are called labels.
A simple example: predicting house prices
Imagine you want to predict house prices.
For houses that were sold in the past, you already know the final sale price, and you also have information such as:
Location
Number of bedrooms
Property type
Size or floor area
The machine-learning model is trained to learns how these labels (features) relate to price.
Once trained, you can give the model a new house it has never seen before — for example:
Location: city
Bedrooms: 3
and ask it to predict the price, even though the true value is unknown.
Because the model learns from labelled examples, this approach is called supervised learning.
Common supervised learning tasks
Classification
Predicting a category
Example: disease vs no diseaseRegression
Predicting a numerical value
Example: gene expression level, house price
Supervised learning in biology and genomics
In life sciences, supervised learning is often used when:
Outcomes are known
Labels are reliable
Examples include:
Predicting sample classes (case vs control)
Classifying cell types
Predicting functional effects of genetic variants
The quality of supervised learning depends heavily on good labels. If labels are incorrect or inconsistent, the model will learn incorrect patterns.
Unsupervised learning: finding structure without labels
In unsupervised learning, no labels are provided. The algorithm is given raw data and asked to find structure on its own.
Instead of being told what to look for, it explores the data and answers questions like:
Are there natural groups?
Which samples look similar?
Are there unusual outliers?
A simple example
Imagine you have gene expression data from hundreds of samples, but you don’t know:
How many groups exist
Whether groups exist at all
An unsupervised algorithm can:
Cluster samples based on similarity
Reveal hidden structure
Suggest patterns you might not have anticipated
Common unsupervised learning tasks
Clustering
Grouping similar samples togetherDimensionality reduction
Simplifying complex data for visualisationOutlier detection
Finding unusual or unexpected samples
Unsupervised learning in biology and genomics
Unsupervised learning is particularly powerful when:
Exploring new datasets
Labels are unavailable or uncertain
Hypotheses are still forming
Examples include:
Identifying subpopulations of cells
Exploring heterogeneity in RNA-seq data
Detecting batch effects or technical artefacts
Unsupervised learning is often used early in analysis to understand the data before making assumptions.
Supervised vs unsupervised: which is better?
Neither approach is “better” — they answer different questions.
In practice, many analyses use both:
Unsupervised learning to explore and understand the data
Supervised learning to test specific predictions