Contributed by Cory Schlesener, B.S.
One important component of a genome’s overall composition is the larger structure of how conserved blocks of genetic sequence are arranged. As segments of DNA recombine, sequences are introduced into new locations and/or orientations in a genome. However, this composition of large genetic blocks can artificially be rearranged in a constructed genome sequence, depending on the strategies and methods used for sequencing and assembly. Constructing genome assemblies from short sequencing reads can yield regions of low sequence coverage and assembly accuracy. This can lead to inaccurate constructions (order and orientation) of genetic blocks. composition can also be artificially fitted to the reference genome when used. Comparing genome assembly structural identity can give another metric on how divergent two genomes are, and comparing to differences in sequence identity can give hits on assembly accuracy. Additionally, assemblies of a genome by different methods can be compared. This can give insight into what the most accurate and consistent method of assembly is for a particular organism ‘s genomes and sequencing platform used. Here is an interesting newer approach for quantifying genome structural similarity for such comparisons.
“GMASS: a novel measure for genome assembly structural similarity”
Daehong Kwon, Jongin Lee & Jaebum Kim
BMC Bioinformatics volume 20, Article number: 147 (2019)
BMC Bioinformatics volume 20, Article number: 147 (2019)