Machine learning explained simply: supervised vs unsupervised learning

Supervised vs Unsupervised Learning

Machine learning is often talked about as if it’s mysterious or magical. In reality, it’s a set of methods that allow computers to learn patterns from data and use those patterns to make predictions, group data, or highlight unusual behaviour.

At its core, machine learning answers a simple question:

Can a computer learn something useful from data without being explicitly told every rule?

What is machine learning?

Machine learning (ML) is a branch of data science in which algorithms learn from data rather than following fixed, hand-written rules.

Rather than telling a computer:

“If X happens, then do Y”

we show it many examples and let it learn the relationship itself.

This approach is especially useful when:

  • Data is complex or high-dimensional

  • Patterns are subtle

  • Rules are hard to define in advance

This is why machine learning is widely used in areas such as genomics, imaging, finance, and recommendation systems.

The two main types of machine learning

Most machine learning methods fall into two broad categories:

  • Supervised learning

  • Unsupervised learning

The key difference is whether the data comes with known answers.

Supervised learning: learning with labels

In supervised learning, the algorithm is trained on data where the correct answer is already known. These known answers are called labels.

A simple example: predicting house prices

Imagine you want to predict house prices.

For houses that were sold in the past, you already know the final sale price, and you also have information such as:

  • Location

  • Number of bedrooms

  • Property type

  • Size or floor area

The machine-learning model is trained to learns how these labels (features) relate to price.

Once trained, you can give the model a new house it has never seen before — for example:

  • Location: city

  • Bedrooms: 3

and ask it to predict the price, even though the true value is unknown.

Because the model learns from labelled examples, this approach is called supervised learning.

Common supervised learning tasks

  • Classification
    Predicting a category
    Example: disease vs no disease

  • Regression
    Predicting a numerical value
    Example: gene expression level, house price

Supervised learning in biology and genomics

In life sciences, supervised learning is often used when:

  • Outcomes are known

  • Labels are reliable

Examples include:

  • Predicting sample classes (case vs control)

  • Classifying cell types

  • Predicting functional effects of genetic variants

The quality of supervised learning depends heavily on good labels. If labels are incorrect or inconsistent, the model will learn incorrect patterns.

Unsupervised learning: finding structure without labels

In unsupervised learning, no labels are provided. The algorithm is given raw data and asked to find structure on its own.

Instead of being told what to look for, it explores the data and answers questions like:

  • Are there natural groups?

  • Which samples look similar?

  • Are there unusual outliers?

A simple example

Imagine you have gene expression data from hundreds of samples, but you don’t know:

  • How many groups exist

  • Whether groups exist at all

An unsupervised algorithm can:

  • Cluster samples based on similarity

  • Reveal hidden structure

  • Suggest patterns you might not have anticipated

Common unsupervised learning tasks

  • Clustering
    Grouping similar samples together

  • Dimensionality reduction
    Simplifying complex data for visualisation

  • Outlier detection
    Finding unusual or unexpected samples

Unsupervised learning in biology and genomics

Unsupervised learning is particularly powerful when:

  • Exploring new datasets

  • Labels are unavailable or uncertain

  • Hypotheses are still forming

Examples include:

  • Identifying subpopulations of cells

  • Exploring heterogeneity in RNA-seq data

  • Detecting batch effects or technical artefacts

Unsupervised learning is often used early in analysis to understand the data before making assumptions.

Supervised vs unsupervised: which is better?

Neither approach is “better” — they answer different questions.

In practice, many analyses use both:

  • Unsupervised learning to explore and understand the data

  • Supervised learning to test specific predictions

Next
Next

Exploring Functional Genomics: Unlocking Gene Functions with CRISPR/Cas9 and RNAi