Contributed by Allison Weis
The big scary world of genomics, when first entering, can be overwhelming to say the least. But with a deep breath and these tools, you’ll soon be whizzing down ATG Avenue with the other polymerases. First, if you’ve received sequence data, you need an assembler! We use Abyss in the Weimer lab, but there are a few other good options for bacterial assembly. A5, or Andrew and Aaron’s Awesome Assembly Pipeline, is a popular one that you can reference here: (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0042304). SPAdes is another good assembler out there and some consider it the best for bacterial assemblies. If you aren’t familiar with command line commandeering – check out using CLC or the Geneious assembler. Once you have your genome assembled, you need to annotate the thing! That’s right… you need to figure out what genes are present and where they live on the chromosome. We use the program Prokka because it’s a great pipeline already set up and incorporates several different layers of annotations which you can read about here: http://www.vicbioinformatics.com/software.shtml. Execute it on the command line and you’ll have a bacterial genome done in ~3 minutes! Now that you have an annotated genome, you can run it through the genomic aligner named Mauve http://darlinglab.org/mauve/mauve.html. If you have a closed genome this is easy – if a draft genome you need to align the contigs to a reference first. But it’s not too hard and soon you’ll have beautiful comparative visualizations of your genomes! Super cool! Mummer is a popular tool of doing pairwise comparisons from one genome to the next and comparisons can be by nucleotides or by protein comparisons. Genome to Genome distance calculation tools have been developed here http://ggdc.dsmz.de/distcalc2.php and will give you a distance matrix with which you can build a phylogenetic tree. Now comes the interesting part: asking questions of your genomes. Want to know about the Antibiotic resistance genes in your genomes? Try running them through this database http://arpcard.mcmaster.ca/?q=CARD/tools/RGI. To begin analyzing your genomes for classic virulence factors give this database a try http://www.mgc.ac.cn/cgi-bin/VFs/vfs.cgi?VFID=VF0053#VF0053. Happy hunting!