What is a Sankey plot (and when not to use it)?
A Sankey plot is a directed flow diagram in which the width of each link indicates the quantity. It is designed to show how a total amount splits, flows, and sometimes re-splits across successive stages of a process.
Sankey plots are especially effective when you want readers to understand where things go, how much is lost, and how categories change over steps.
When should you use a Sankey plot?
Use a Sankey plot when your data represents flow through stages and when conservation or loss matters.
Typical and effective use cases include:
✔ Bioinformatics and genomics
Read fate across an RNA-seq or ATAC-seq pipeline
(raw reads → QC pass → aligned → assigned → filtered)Variant consequences across annotation tiers
(all variants → coding → missense → pathogenic)
✔ Sample and project tracking
Sample triage
(collected → passed QC → sequenced → analysed)Cohort filtering
(initial cohort → exclusions → final analysis set)
✔ Resources and cost breakdowns
Cost or time allocation across pipeline steps
Compute usage split by analysis stage
If the question is “where did the data go?”, a Sankey plot is often the right answer.
When should you not use a Sankey plot?
Sankey plots are powerful—but very easy to misuse.
Avoid them in the following situations:
✖ Time series data
If you want to show trends over time, use:
Line charts
Area charts
Sankey plots do not encode time well.
✖ Too many small categories
If you have dozens of very thin flows (≈ more than 40–50):
The diagram becomes unreadable
Important patterns disappear into visual noise
In these cases, use:
Bar charts
Faceted plots
Or group small categories into “Other”
✖ Cycles or bidirectional networks
Sankey plots assume a left-to-right, acyclic flow.
They are not suitable for:
Feedback loops
Iterative processes
Bidirectional interactions
For those, use:
Network diagrams
Chord diagrams
Graph layouts
Design best practices (the ones reviewers actually like)
1) Stages and ordering matter
Arrange stages logically from left to right
Keep the same order across related figures
Never reorder stages just to “make it look nicer”
Consistency builds trust.
2) Use colour with restraint
Assign consistent colours to the same category across all stages
Avoid rainbow palettes
Use muted backgrounds
Group related categories using colour families
If colour encodes meaning, explain it.
3) Labels and tooltips
Node labels: short, precise, unambiguous
(“Aligned reads”, not “Aligned”)Link values: show both absolute counts and percentages
e.g. “700,000 (11.2%)”
For interactive figures, use tooltips.
For static figures, annotate only the key flows.
4) Handle small categories carefully
Very small flows (<1–2% of the total):
Add clutter
Distract from the main story
Group them into “Other”, and explain this choice in the caption.
5) Always state units and totals
Make it explicit what the widths represent:
Reads
Samples
Variants
Compute hours
A subtitle helps enormously, for example:
Total reads analysed: N = 6.5 million
6) Accessibility matters
Use colour-blind friendly palettes
Ensure sufficient contrast
Provide a text summary or table alongside the figure
Remember: figures should support your story, not replace it.
7) Avoid cycles and double-counting
Sankey diagrams must not loop.
If your process is iterative:
Summarise each iteration separately, or
Switch to a network-based visualisation
Never allow the same quantity to be counted twice without explicit explanation.