Project Details
Description
Recent breakthroughs in biotechnology and sequencing create numerous new opportunities for comparative analysis of complex genomic sequences. Comparative genomic analysis requires building databases and tools to address important biological questions including elucidating the molds of conservation and patterns of divergence that lead to species diversification, robustness, fitness, and taxonomical organization. Evolutionary selection creates variable rates of conservation on different functional sites thereby producing distinctive comparative signatures in different genomic regions. These signatures can be exploited by computational methods for an improved detection of functionally important regions such as protein-coding exons, RNA genes, promoters regions, initiation sites and 3'UTR regions. Since the pattern of conservation in coding regions is different from the pattern in intronic or intergenic regions, a comparative computational analysis can lead, in principle, to a significantly improved computational identification of genes in a target genome by using a reference genome. This work focuses on a systematic investigation of these fundamental questions with the goal of producing a prototype, modular and adaptive gene identification system that can be trained on different species. A modular and adaptive architecture for modeling genomic sequences and evolutionary alignment for comparative genomic analysis has three important features: a) A compositional organization that allows global model training for cross-species analysis of specific organisms. b) A novel mechanism to perform Bayesian adaptation of the model to the specific pair of orthologous genes being compared in order to refine the interpretation and thereby improve prediction accuracy. c) A flexible capability to integrate other sources of evidence for comparative gene identification that include matches to proteins, ESTs and gene expression databases. There are a number of broad implications of the proposed research, cataloging the evolutionary patterns that can be extracted from comparative analysis and discovery of novel genes, building a highly usable software prototype for comparative genomic analysis that will be distributed freely to the academic community or order to improve the annotation accuracy of genomic regions; Producing comparative databases of functional sites that can be used for evolutionary analysis and biological research, and training a new generation of computational biology researchers to attain a deeper understanding of biological processes as well as sophisticated computational methodologies.
| Status | Finished |
|---|---|
| Effective start/end date | 5/1/03 → 4/30/08 |