This will be the first of a series of posts that will detail
the process I have used to analyze a set of RNA seq data for our lab.
Library Preparation
We’ll start by
assuming that RNA has been successfully extracted from some biological source
and ribosomal RNA has been removed.
Step 1A: RNA -> cDNA
To create a DNA library that can be
sequenced, it is first necessary to convert our pool of RNA into complementary
DNA (cDNA). One technique is to use a small DNA primer to target the poly-A
tail on strands of mRNA and extend the primer to create cDNA.
Step 1B: Strand specificity (optional)
During the creation of the cDNA library,
the initial strand cDNA is synthesized from a primer annealed to an RNA strand.
During the synthesis of the complementary cDNA strand, it is possible to use
dUTP instead of dTTP to mark the second strand with “U”s. This method results
in double stranded cDNA where each strand is distinguishable from each other
(where one contains T and the other contains U). The second strand can be
targeted and degraded leaving a single strand that can be sequenced. The
benefit of this method is that it allows each sequenced fragment to be mapped
uniquely to one specific strand of the reference genome. For instance, if two
genes overlap on opposite strands of a genome then this strand-specific
sequencing strategy will be able to determine which strand (and therefore gene)
each RNA fragment was derived from.
Step 2: Size selection
At this point, some protocol is used to
break down all the cDNA fragments to a common size. Typically, this is some
physical process such as sonication. The ends of these fragments are repaired
and additional segments of synthetic DNA are ligated on the ends of the
fragments. These include primers required to initiate sequencing and barcode
sequences to identify which fragments belong to which sample (crucial for
multiplexed experiments where multiple samples are sequenced at the same time).
Step 3: Amplification
Finally, the library is PCR amplified and is ready to be sequenced.
Illumina Sequencing
For those out there (including myself)
that are visually-orientated, this video will demonstrate the basic steps:
http://youtu.be/77r5p8IBwJk?t=45s
http://youtu.be/77r5p8IBwJk?t=45s
After the library is prepared, the first step is to wash and
mount the fragments on a flow cell. Fragments will be randomly distributed over
this surface. Next, a technique called bridge amplification duplicates the annealed
fragments to create a monoclonal cluster of DNA.
This is where the sequencing
begins. First a primer is annealed to every DNA segment attached to the flow
cell. A series of cycles is preformed where a single (reversible terminators) fluorescent
nucleotide is added to each growing DNA fragment. The each cluster of fragments
emits a specific wavelength corresponding to one of four nucleotides. By measuring
the wavelength of light emitted from each cluster after each cycle, the
sequence of each fragment can be determined.
Finally, a technique known as paired-end sequencing can be
used to double the sequencing information derived from each sample. The general idea is that each strand of DNA is
sequenced from 5’ to 3’. This results in two sets of sequences, one corresponding to each DNA strand. Each pair may or may not overlap depending on the size of the fragment and the length of each read.
(For further details and figures, please check out this page: http://nextgen.mgh.harvard.edu/IlluminaChemistry.html)
Once all this is done, the Illumina platform analyzes a huge set of images to derive the sequence of all the fragments. Our RNA-seq experiment uses random priming, pair-end sequencing, and is not strand specific. The blog posts that follow this will be written for this type of analysis.
In the next post, I will explain the output of this
sequencing and how to analyze its quality.
No comments:
Post a Comment