Solving PCR Bias in RNA-Seq with Unique Molecular Identifiers (UMIs)
What is PCR Bias, and Why Does it Matter?
PCR (polymerase chain reaction) is a cornerstone of modern molecular biology. It allows small amounts of DNA or RNA to be amplified into quantities that can be analysed and sequenced. In RNA-seq workflows, PCR amplification is essential to generate sufficient material for sequencing, but it introduces a well-known problem: PCR bias.
PCR bias arises because not all molecules amplify equally well. Some sequences are preferentially amplified, while others are under-amplified. As a result, the number of sequencing reads no longer reflects the number of original molecules in the sample.
This makes it difficult to distinguish a true biological signal from technical artefacts introduced during amplification. The problem is particularly acute in sensitive applications such as:
single-cell RNA-seq
low-input RNA-seq
experiments targeting rare transcripts
In these settings, even small amplification biases can substantially distort quantitative results.
Enter Unique Molecular Identifiers (UMIs)
UMIs are short, random nucleotide sequences that are added to each individual RNA (or cDNA) molecule before PCR amplification. You can think of UMIs as molecular barcodes that uniquely label each original molecule in the sample.
Because the tag is added prior to amplification, all PCR copies derived from the same original molecule carry the same UMI. This makes it possible to identify and collapse PCR duplicates during downstream analysis.
How UMIs Work in Practice
The UMI workflow can be broken down into three key steps:
1. Tagging
During reverse transcription or early library preparation, a unique UMI sequence is attached to each RNA or cDNA molecule. Ideally, each original molecule receives a different UMI.
2. Amplification
PCR amplification is performed as usual. Multiple copies of each original molecule are generated, but all copies retain the same UMI.
3. Deduplication
After sequencing, reads are grouped by mapping position and UMI. Reads that share both are interpreted as PCR duplicates and counted only once. This allows the number of original molecules to be estimated more accurately.
Benefits of Using UMIs in RNA-Seq
The UMI workflow can be broken down into three key steps:
1. Tagging
During reverse transcription or early library preparation, a unique UMI sequence is attached to each RNA or cDNA molecule. Ideally, each original molecule receives a different UMI.
2. Amplification
PCR amplification is performed as usual. Multiple copies of each original molecule are generated, but all copies retain the same UMI.
3. Deduplication
After sequencing, reads are grouped by mapping position and UMI. Reads that share both are interpreted as PCR duplicates and counted only once. This allows the number of original molecules to be estimated more accurately.
Should You Use UMIs?
UMIs are particularly beneficial when accurate molecule counting is required. They are commonly used in:
Single-cell RNA-seq
Where each cell contributes very little RNA, and amplification bias is unavoidable.Low-input or rare samples
Including clinical material or studies of low-abundance transcripts.
For standard bulk RNA-seq with high-quality input RNA, UMIs may be less critical, but they can still improve confidence in quantitative results.