TY - GEN
T1 - Data-driven insights into deletions of Mycobacterium tuberculosis complex chromosomal DR region using spoligoforests
AU - Ozcaglar, Cagri
AU - Shabbeer, Amina
AU - Kurepina, Natalia
AU - Yener, Bulent
AU - Bennett, Kristin P.
PY - 2011
Y1 - 2011
N2 - Biomarkers of Mycobacterium tuberculosis complex (MTBC) mutate over time. Among the biomarkers of MTBC, spacer oligonucleotide type (spoligotype) and Mycobacterium Interspersed Repetitive Unit (MIRU) patterns are commonly used to genotype clinical MTBC strains. In this study, we present an evolution model of spoligotype rearrangements using MIRU patterns to disambiguate the ancestors of spoligotypes, in a large patient dataset from the United States Centers for Disease Control and Prevention (CDC). Based on the contiguous deletion assumption and rare observation of convergent evolution, we first generate the most parsimonious forest of spoligotypes, called a spoligo forest, using three genetic distance measures. An analysis of topological attributes of the spoligo forest and number of variations at the direct repeat (DR) locus of each strain reveals interesting properties of deletions in the DR region. First, we compare our mutation model to existing mutation models of spoligotypes and find that our mutation model produces as many within-lineage mutation events as other models, with slightly higher segregation accuracy. Second, based on our mutation model, the number of descendant spoligotypes follows a power law distribution. Third, contrary to prior studies, the power law distribution does not plausibly fit to the mutation length frequency. Finally, the total number of mutation events at consecutive DR loci follows a bimodal distribution, which results in accumulation of shorter deletions in the DR region. The two modes are spacers 13 and 40, which are hotspots for chromosomal rearrangements. The change point in the bimodal distribution is spacer 34, which is absent in most MTBC strains. This bimodal separation results in accumulation of shorter deletions, which explains why a power law distribution is not a plausible fit to the mutation length frequency.
AB - Biomarkers of Mycobacterium tuberculosis complex (MTBC) mutate over time. Among the biomarkers of MTBC, spacer oligonucleotide type (spoligotype) and Mycobacterium Interspersed Repetitive Unit (MIRU) patterns are commonly used to genotype clinical MTBC strains. In this study, we present an evolution model of spoligotype rearrangements using MIRU patterns to disambiguate the ancestors of spoligotypes, in a large patient dataset from the United States Centers for Disease Control and Prevention (CDC). Based on the contiguous deletion assumption and rare observation of convergent evolution, we first generate the most parsimonious forest of spoligotypes, called a spoligo forest, using three genetic distance measures. An analysis of topological attributes of the spoligo forest and number of variations at the direct repeat (DR) locus of each strain reveals interesting properties of deletions in the DR region. First, we compare our mutation model to existing mutation models of spoligotypes and find that our mutation model produces as many within-lineage mutation events as other models, with slightly higher segregation accuracy. Second, based on our mutation model, the number of descendant spoligotypes follows a power law distribution. Third, contrary to prior studies, the power law distribution does not plausibly fit to the mutation length frequency. Finally, the total number of mutation events at consecutive DR loci follows a bimodal distribution, which results in accumulation of shorter deletions in the DR region. The two modes are spacers 13 and 40, which are hotspots for chromosomal rearrangements. The change point in the bimodal distribution is spacer 34, which is absent in most MTBC strains. This bimodal separation results in accumulation of shorter deletions, which explains why a power law distribution is not a plausible fit to the mutation length frequency.
KW - DR locus
KW - MIRU-VNTR
KW - Mycobacterium tuberculosis complex
KW - mutation
KW - spoligotype
KW - tuberculosis
UR - http://www.scopus.com/inward/record.url?scp=84856043741&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2011.64
DO - 10.1109/BIBM.2011.64
M3 - Conference contribution
SN - 9780769545745
T3 - Proceedings - 2011 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2011
SP - 75
EP - 82
BT - Proceedings - 2011 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2011
T2 - 2011 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2011
Y2 - 12 November 2011 through 15 November 2011
ER -