Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models

Pankaj Mehta, David J. Schwab, Anirvan Sengupta

Research output: Contribution to journalArticle

4 Scopus citations

Abstract

Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.

Original languageEnglish (US)
Pages (from-to)1187-1205
Number of pages19
JournalJournal of Statistical Physics
Volume142
Issue number6
DOIs
StatePublished - Apr 1 2011

All Science Journal Classification (ASJC) codes

  • Statistical and Nonlinear Physics
  • Mathematical Physics

Keywords

  • Bioinformatics
  • Fisher information
  • Hidden Markov Models
  • Machine learning
  • One-dimensional statistical mechanics

Fingerprint Dive into the research topics of 'Statistical Mechanics of Transcription-Factor Binding Site Discovery Using Hidden Markov Models'. Together they form a unique fingerprint.

  • Cite this