TY - JOUR
T1 - A biophysical approach to predicting protein–DNA binding energetics
AU - Locke, George
AU - Morozov, Alexandre V.
N1 - Publisher Copyright: © 2015 by the Genetics Society of America.
PY - 2015/8/1
Y1 - 2015/8/1
N2 - Sequence-specific interactions between proteins and DNA play a central role in DNA replication, repair, recombination, and control of gene expression. These interactions can be studied in vitro using microfluidics, protein-binding microarrays (PBMs), and other high-throughput techniques. Here we develop a biophysical approach to predicting protein–DNA binding specificities from high-throughput in vitro data. Our algorithm, called BindSter, can model alternative DNA-binding modes and multiple protein species competing for access to DNA, while rigorously taking into account all sterically allowed configurations of DNA-bound factors. BindSter can be used witha hierarchy of protein–DNA interaction models of increasing complexity, including contributions of mononucleotides, dinucleotides, and longer words to the total protein–DNA binding energy. We observe that the quality of BindSter predictions does not change significantly as some of the energy parameters vary over a sizable range. To take this degeneracy into account, we have developed a graphical representation of parameter uncertainties called IntervalLogo. We find that our simplest model, in which each nucleotide in the binding site is treated independently, performs better than previous biophysical approaches. The extensions of this model, in which contributions of longer words are also considered, result in further improvements, underscoring the importance of higher-order effects in protein–DNA energetics. In contrast, we find little evidence of multiple binding modes for the transcription factors (TFs) and experimental conditions in our data set. Furthermore, there is limited consistency in predictions for the same TF based on microfluidics and PBM data.
AB - Sequence-specific interactions between proteins and DNA play a central role in DNA replication, repair, recombination, and control of gene expression. These interactions can be studied in vitro using microfluidics, protein-binding microarrays (PBMs), and other high-throughput techniques. Here we develop a biophysical approach to predicting protein–DNA binding specificities from high-throughput in vitro data. Our algorithm, called BindSter, can model alternative DNA-binding modes and multiple protein species competing for access to DNA, while rigorously taking into account all sterically allowed configurations of DNA-bound factors. BindSter can be used witha hierarchy of protein–DNA interaction models of increasing complexity, including contributions of mononucleotides, dinucleotides, and longer words to the total protein–DNA binding energy. We observe that the quality of BindSter predictions does not change significantly as some of the energy parameters vary over a sizable range. To take this degeneracy into account, we have developed a graphical representation of parameter uncertainties called IntervalLogo. We find that our simplest model, in which each nucleotide in the binding site is treated independently, performs better than previous biophysical approaches. The extensions of this model, in which contributions of longer words are also considered, result in further improvements, underscoring the importance of higher-order effects in protein–DNA energetics. In contrast, we find little evidence of multiple binding modes for the transcription factors (TFs) and experimental conditions in our data set. Furthermore, there is limited consistency in predictions for the same TF based on microfluidics and PBM data.
KW - Gene regulation
KW - Protein–DNA interactions
KW - Thermodynamic modeling
KW - Transcription factor
UR - http://www.scopus.com/inward/record.url?scp=84939448388&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84939448388&partnerID=8YFLogxK
U2 - https://doi.org/10.1534/genetics.115.178384
DO - https://doi.org/10.1534/genetics.115.178384
M3 - Article
C2 - 26081193
SN - 0016-6731
VL - 200
SP - 1349
EP - 1361
JO - Genetics
JF - Genetics
IS - 4
ER -