Machine learning explained simply: supervised vs unsupervised learning

1 Feb

Supervised vs Unsupervised Learning

Machine learning is often talked about as if it’s mysterious or magical. In reality, it’s a set of methods that allow computers to learn patterns from data and use those patterns to make predictions, group data, or highlight unusual behaviour.

At its core, machine learning answers a simple question:

Can a computer learn something useful from data without being explicitly told every rule?

What is machine learning?

Machine learning (ML) is a branch of data science in which algorithms learn from data rather than following fixed, hand-written rules.

Rather than telling a computer:

“If X happens, then do Y”

we show it many examples and let it learn the relationship itself.

This approach is especially useful when:

Data is complex or high-dimensional
Patterns are subtle
Rules are hard to define in advance

This is why machine learning is widely used in areas such as genomics, imaging, finance, and recommendation systems.

The two main types of machine learning

Most machine learning methods fall into two broad categories:

Supervised learning
Unsupervised learning

The key difference is whether the data comes with known answers.

Supervised learning: learning with labels

In supervised learning, the algorithm is trained on data where the correct answer is already known. These known answers are called labels.

A simple example: predicting house prices

Imagine you want to predict house prices.

For houses that were sold in the past, you already know the final sale price, and you also have information such as:

Location
Number of bedrooms
Property type
Size or floor area

The machine-learning model is trained to learns how these labels (features) relate to price.

Once trained, you can give the model a new house it has never seen before — for example:

Location: city
Bedrooms: 3

and ask it to predict the price, even though the true value is unknown.

Because the model learns from labelled examples, this approach is called supervised learning.

Common supervised learning tasks

Classification
Predicting a category
Example: disease vs no disease
Regression
Predicting a numerical value
Example: gene expression level, house price

Supervised learning in biology and genomics

In life sciences, supervised learning is often used when:

Outcomes are known
Labels are reliable

Examples include:

Predicting sample classes (case vs control)
Classifying cell types
Predicting functional effects of genetic variants

The quality of supervised learning depends heavily on good labels. If labels are incorrect or inconsistent, the model will learn incorrect patterns.

Unsupervised learning: finding structure without labels

In unsupervised learning, no labels are provided. The algorithm is given raw data and asked to find structure on its own.

Instead of being told what to look for, it explores the data and answers questions like:

Are there natural groups?
Which samples look similar?
Are there unusual outliers?

A simple example

Imagine you have gene expression data from hundreds of samples, but you don’t know:

How many groups exist
Whether groups exist at all

An unsupervised algorithm can:

Cluster samples based on similarity
Reveal hidden structure
Suggest patterns you might not have anticipated

Common unsupervised learning tasks

Clustering
Grouping similar samples together
Dimensionality reduction
Simplifying complex data for visualisation
Outlier detection
Finding unusual or unexpected samples

Unsupervised learning in biology and genomics

Unsupervised learning is particularly powerful when:

Exploring new datasets
Labels are unavailable or uncertain
Hypotheses are still forming