TY - GEN

T1 - On the entropy of DNA

T2 - 6th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1995

AU - Farach, Martin

AU - Noordewier, Michiel

AU - Savari, Serap

AU - Shepp, Larry

AU - Wyner, Abraham

AU - Ziv, Jacob

N1 - Funding Information: Supported by DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), a National Science Foundation Science and Technology Center under NSF contract STC-8809648. Funding Information: 'fu.cWc,.rotg~r8.odu;Supportedby DIMACS (Centerfor Discrete Mathematics aud Theoretical Computer Science), a National Science Foundation Science and Technology Center under NSF contract STC-8809648. ! noordowi~cs.rotgors.odu *,yskit.tdu qlasQresarch.att.com Il.j.Opl.~.ir.8turiord.odu

PY - 1995/1/22

Y1 - 1995/1/22

N2 - We have applied the information theoretic notion of entropy to characterize DNA sequences. We consider a genetic sequence signal that is too small for asymptotic entropy estimates to be accurate, and for which similar approaches have previously failed. We prove that the match length entropy estimator has a relatively fast converge rate and demonstrate experimentally that by using this entropy estimator, we can indeed extract a meaningful signal from segments of DNA. Further, we derive a method for detecting certain signals within DNA - known as splice junctions - with significantly better performance than previously known methods. The main result of this paper is that we find that the entropy of genetic material which is ultimately expressed in protein sequences is higher than that which is discarded. This is an unexpected result, since current biological theory holds that the discarded sequences ("introns") are capable of tolerating random changes to a greater degree than the retained sequences ("exons").

AB - We have applied the information theoretic notion of entropy to characterize DNA sequences. We consider a genetic sequence signal that is too small for asymptotic entropy estimates to be accurate, and for which similar approaches have previously failed. We prove that the match length entropy estimator has a relatively fast converge rate and demonstrate experimentally that by using this entropy estimator, we can indeed extract a meaningful signal from segments of DNA. Further, we derive a method for detecting certain signals within DNA - known as splice junctions - with significantly better performance than previously known methods. The main result of this paper is that we find that the entropy of genetic material which is ultimately expressed in protein sequences is higher than that which is discarded. This is an unexpected result, since current biological theory holds that the discarded sequences ("introns") are capable of tolerating random changes to a greater degree than the retained sequences ("exons").

UR - http://www.scopus.com/inward/record.url?scp=84994364597&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994364597&partnerID=8YFLogxK

M3 - Conference contribution

T3 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms

SP - 48

EP - 57

BT - Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 1995

PB - Association for Computing Machinery

Y2 - 22 January 1995 through 24 January 1995

ER -