Reproducible Bioinformatic Workflows with Snakemake

Contributed by Shawn Higdon

As I begin the newest chapter of my PhD journey – a cross-college internship at UC Davis with the Weimer lab in School of Veterinary Medicine – I am faced with new computational tasks that demand the implementation of numerous bioinformatic programs. Specifically, the focus of my internship involves the investigation of potential shifts in microbial metabolism and population structure within various components of the human microbiome. The overall approach of the lab is to adopt more of a systems biology approach in which the power of metabolomics and meta-transcriptomics will be employed to uncover potential shifts in metabolic output as a result of dietary changes or adverse human health conditions.

My particular role on these projects during the internship will be to carry out bioinformatic analysis on meta-RNAseq data sets that were generated from patients of clinical experiments. While this type of analysis may be new to me, these types of studies will undoubtedly continue to be carried out to answer the myriad biological questions surrounding the impact of the human microbiome on human population health and ecology. Because of this, my primary goal for this internship in terms of deliverables is to generate reproducible bioinformatic analysis pipelines that are capable of being called upon for future analysis of similar types of studies that incorporate meta-RNAseq experiments.

After having spent time with computationally savvy members of The Lab for Data Intensive Biology at U.C. Davis, I came to learn of an incredibly powerful Python package called Snakemakethat provides the programming community with a workflow management tool. While my experience in using Snakemakeis just beginning, I aim to emerge from the internship having acquired the skill of calling on Snakemake to develop at least two different flavors of bioinformatic workflows: one workflow that can be used to analyze meta-RNAseq datasets and another that can be used to annotate microbial isolates for plant growth promoting functionalities.

 

This entry was posted in uncategorized. Bookmark the permalink.