Contributed by Shawn Higdon
Interpretation of Sequencing data from experiments targeting gene expression in any biological system requires a series of computational and analytical steps. After going through the motions of pre-processing sequencing reads into a state of high quality (trimming low-quality base calls and removing adapter sequences that slip through the cracks), the next step is to generate count information for each transcript that was present within each sample’s cDNA sequencing library. In earlier times, the standard approach for this step was to use sequence alignment-based bioinformatics software to “map” each sequencing read to a reference genome for the organism under investigation, followed by quantification of transcript count based on the mapping results. While this approach is indeed reliable, caveats include heavy computational requirements, lengthy runtime, and the generation of mapping files that tend to occupy large volumes of disk space (storage).
Within the past five years, an alternative approach has been implemented in the area of transcript quantification through the work of Rob Patro and his colleagues, where they have developed software that allows for much faster transcript quantification using a method that ditches the alignment-based approach altogether. Through the development and release of programs such as Sailfish (Patro, Mount, and Kingsford 2014) and Salmon (Patro et al. 2017), generating transcript level abundance and count estimates is now achievable in significantly shorter time periods without the draw-back of consuming large amounts of storage space. Rather than mapping to a reference genome, the transcriptome can now be used in its place for transcript quantification – should one be fortunate enough to have this resource available for the system under investigation.
Salmon is well supported within the scientific community focusing on gene expression analysis, which is evidenced by its ease of installation and strong documentation. Salmon can easily be installed using Anaconda through the Anaconda cloud (https://anaconda.org/bioconda/salmon) and several vignettes have been posted online in order to help biologists move along the road of Big Sequencing Data analysis towards answering some of our greatest questions!