Contributed by Shawn Higdon
How much of the microbial biodiversity on planet Earth has mankind managed to grow in the laboratory? I think the number thrown around as a rough approximation in early biology classes is 5 percent. Five percent, right? Or is it one percent? One percent sounds more realistic…or is it one tenth of one percent? I’m not sure if we will ever truly be able to answer this question with absolute certainty, but this number is likely to be up for debate depending on the parties involved.
A fascinating approach towards tackling the problem of how to devise microbial culturing strategies was brought to my attention when I met with Titus Brown several months ago. As it turns out, Adrian Viehweger and his colleagues at the Friedrich Schiller University of Jena in Germany thought that it might be a good idea think of protein sequences in microbial genomes in a way that is similar to how we view words in a text document. Essentially, they’ve developed an incredible way to adapt the Word2Vec algorithm for deep learning applications of genome biology! What do they call this tool? Nanotext, of course! The group’s preprint is currently available on BioRxiv and is a great read. I find it particularly amazing that they are able to show how Nanotext can be used to predict a culture medium for metagenome-assembled genomes.
While I view this as being a powerful tool for the future of developing microbial culture media tailored for growing and storing previously uncultured microbes that we can see in microbiome samples through DNA sequencing, I see the Nanotext approach as having additionally utility for describing the functions of proteins that are assigned the oh-so-popular annotation of, “hypothetical protein”. Is it possible for two protein sequences to have the same domains present but display them in different configurations or, “architectures”? I see Nanotext being used to overcome hurdles in functional annotation associated with sequence-alignment based approaches in the near future!
Here are links to Nanotext Github Documentation/Source code and the BioRxiv preprint…