Next Generation Sequencing (NGS): a beginner’s guide

This blog post aims to provide a comprehensive overview of essential concepts for a better comprehension of next generation sequencing (NGS). It delves into the historical context of DNA discovery, explores various sequencing methodologies, highlights prevalent applications of NGS, and outlines fundamental procedural steps. These steps encompass sample collection, library preparation, cluster generation, sequencing, and the crucial phase of data analysis. By the end of this article, readers will have a solid foundation in the fundamentals of NGS, equipping them with valuable insights into its scientific and technological compexicity.

The discovery of DNA                                                                                           

DNA was first identified in 1869 by the Swiss physiological chemist, Friedrich Miescher. He called it “nuclein”. Unfortunately for him, the first 50 years after his discovery, no one was interested. The Russion biochemist Phoebus Levene was the first who identified the way RNA and DNA molecules are put together in 1919. Later, Erwin Chargaff expanded on Levene’s work and described that the amount of adenine (A) was usually similar to the amount of thymine (T) and that the amount of guanine (G) was nearly equal to the amount of cytosine (C) in all studied organisms. This idea that G=C and A=T in combination with X-ray chrystallography work done by Rosalind Franklin and Maurice Wilkins, contributed to Watson and Crick’s proposal in 1953 that DNA consists of a double helix.

From Sanger sequencing to next generation sequencing to third generation sequencing

Sanger Sequencing was developed by Frederick Sanger in 1977. It was the first method of DNA sequencing and is based on the random incorporation of chain-terminating nucleotides during in vitro replication. At first, it was a very labour intense technique, before it was commercialized in 1986. Currently, Sanger Sequencing is still frequently sued for small-scale projects.

The development of the sequencing by synthesis method, better known as next generation sequencing, meant the start of an explosion of DNA/RNA sequencing. This technique allows the massively parallel sequencing of millions of fragments simultaneously. At first, sequencing cost was enormous, but now it’s more affordable. Although there are several platforms, the Illumina clonal bridge amplification method (see this video to understand how it works) is currently the most popular.

A few years later, the newest sequencing technology appeared, single molecule sequencing, also described as third generation sequencing. The most popular one is the Oxford Nanopore’s technology. The DNA or RNA molecule passes through a nanoscale pore structure and the machine measures changes in electrical field surrounding the pore.

And more recently, a fourth generation sequencing has been described, that can be used to read the nucleic acid composition directly in fixed cells and issues.

Popular applications for NGS

Now, NGS has become an invaluable tool both for scientific research and clinical diagnosis. Examples are:

-  Rapidly sequence whole genomes
- Deeply sequence target regions
-  Transcriptome analysis (RNA-Seq) to study gene expression analysis
-  Analysis of epigenetic factors such as targeted or genome-wide DNA or RNA methylation patterns
- DNA-Protein interactions
- RNA-Protein interactions
- Sequence cancer samples to study rare somatic variants, tumour subclones, etc.
- Study the human microbiome
- Identify novel pathogens

How does NGS sequencing work?

Here, I’ll just explain the basic principle. In later blogs, I will discuss some specific methods more in detail.

-       Sample preparation

DNA or RNA is extracted from the selected samples (blood, tissue, cultured cells, …) and checked for quality, using standard methods such as Nanodrop, Tapestation or Bioanalyser.

Depending on the method, additional samples preparation steps might be necessary, such as reverse transcription into cDNA for some types of RNA library preparation, although some library preparation protocols include this step.

For RNA-Seq experiments, it’s usually necessary to remove the ribosomal RNA, as this counts for roughly 80% of the RNA. This can be done with polyA selection or rRNA depletion. We’ll describe these methods and their pros and cons in a different blog.

-       Library preparation

There are many different types of library preparation, but the basic principles are usually the same.

            The DNA (or cDNA) has to be processed into relatively short fragments (100-800 bp). This requires random fragmentation of the DNA or cDNA by enzymatic treatment or sonication. After end-repair, this is followed by ligation of small adapter sequences, which allows for multiple samples to be pooled together, also known as multiplexing (note: some library prep kits have a combined fragmentation/adapter ligation step).

Often, a size selection step is performed next, usually by gel electrophoresis or magnetic beads. This will remove all fragments that are too short or too long for sequencing.

Usually, a library enrichment/amplification step (PCR) is performed next. In most protocols this is combined with adding indexes, so samples can be pooled together on the sequencer. In a later blog, I’ll explain why you need adapters and indexes.

After a final clean up, usually with magnetic beads, the final libraries can undergo quality checks using Tapestation or Bioanalyzer. Although not required, it is a good practice to use qPCR to accurately determine the final quantity.

-       Cluster generation

Depending on the selected platform and chemistry, methods differ, but as Illumina’s     technology is the most used one, I’ll focus on that method here. Before the actual sequencing reaction can happen, the library must be attached to a solid surface, the flow cell, and clonally amplified to increase the signal, so it can be detected from each fragment during the sequencing reaction. Also, this video about paired-end sequencing is useful to understand bridge amplification.

-       Sequencing reaction

All the fragments are sequenced at the same time. Every cycle, one base will be incorporated and detected on each fragment. When the run has finished, the data can be “demultiplexed”, to generate a fastq file (or two fastq files in case of paired-end sequencing) with the raw data for each sample.

-       Data analysis

Data analysis start with a whole range of quality checks. You can see a few examples of our quality checks here. The good quality reads will then be mapped against the reference genome of the relevant species. From here, each analysis has to be tailored to the research project to extract meaningful information from the data.

And that’s where we come into the picture. Do you have questions about the analysis of your project? Do you want us to analyse your data ? Please, don’t hesitate to contact us!

Previous
Previous

Mitochondrial gene editing