On the entropy of DNA: Algorithms and measurements based on memory and rapid convergence

Martin Farach, Michiel Noordewier, Serap Savari, Larry Shepp, Abraham Wyner, Jacob Ziv

Research output: Chapter in Book/Report/Conference proceedingConference contribution

61 Scopus citations

Abstract

We have applied the information theoretic notion of entropy to characterize DNA sequences. We consider a genetic sequence signal that is too small for asymptotic entropy estimates to be accurate, and for which similar approaches have previously failed. We prove that the match length entropy estimator has a relatively fast converge rate and demonstrate experimentally that by using this entropy estimator, we can indeed extract a meaningful signal from segments of DNA. Further, we derive a method for detecting certain signals within DNA - known as splice junctions - with significantly better performance than previously known methods. The main result of this paper is that we find that the entropy of genetic material which is ultimately expressed in protein sequences is higher than that which is discarded. This is an unexpected result, since current biological theory holds that the discarded sequences ("introns") are capable of tolerating random changes to a greater degree than the retained sequences ("exons").

Original languageEnglish (US)
Title of host publicationProceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1995
PublisherAssociation for Computing Machinery
Pages48-57
Number of pages10
ISBN (Electronic)0898713498
StatePublished - Jan 22 1995
Event6th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1995 - San Francisco, United States
Duration: Jan 22 1995Jan 24 1995

Publication series

NameProceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

Conference

Conference6th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1995
Country/TerritoryUnited States
CitySan Francisco
Period1/22/951/24/95

ASJC Scopus subject areas

  • Software
  • Mathematics(all)

Fingerprint

Dive into the research topics of 'On the entropy of DNA: Algorithms and measurements based on memory and rapid convergence'. Together they form a unique fingerprint.

Cite this