Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles

David A. Snyder, Gaetano T. Montelione

Research output: Contribution to journalArticlepeer-review

41 Scopus citations


An important open question in the field of NMR-based biomolecular structure determination is how best to characterize the precision of the resulting ensemble of structures. Typically, the RMSD, as minimized in superimposing the ensemble of structures, is the preferred measure of precision. However, the presence of poorly determined atomic coordinates and multiple "RMSD-stable domains"-locally well-defined regions that are not aligned in global superimpositions-complicate RMSD calculations. In this paper, we present a method, based on a novel, structurally defined order parameter, for identifying a set of core atoms to use in determining superimpositions for RMSD calculations. In addition we present a method for deciding whether to partition that core atom set into "RMSD-stable domains" and, if so, how to determine partitioning of the core atom set. We demonstrate our algorithm and its application in calculating statistically sound RMSD values by applying it to a set of NMR-derived structural ensembles, superimposing each RMSD-stable domain (or the entire core atom set, where appropriate) found in each protein structure under consideration. A parameter calculated by our algorithm using a novel, kurtosis-based criterion, the ε-value, is a measure of precision of the superimposition that complements the RMSD. In addition, we compare our algorithm with previously described algorithms for determining core atom sets. The methods presented in this paper for biomolecular structure superimposition are quite general, and have application in many areas of structural bioinformatics and structural biology.

Original languageEnglish (US)
Pages (from-to)673-686
Number of pages14
JournalProteins: Structure, Function and Genetics
Issue number4
StatePublished - Jun 1 2005

All Science Journal Classification (ASJC) codes

  • Molecular Biology
  • Structural Biology
  • Biochemistry


  • Biomolecular NMR
  • Hierarchical clustering
  • RMSD-stable domains
  • Superimposition


Dive into the research topics of 'Clustering algorithms for identifying core atom sets and for assessing the precision of protein structure ensembles'. Together they form a unique fingerprint.

Cite this