Using random forest for protein fold prediction problem: An empirical study

Abdullah Dehzangi, Somnuk Phon-Amnuaisuk, Omid Dehzangi

Research output: Contribution to journalArticlepeer-review

43 Scopus citations


The functioning of a protein in biological reactions crucially depends on its threedimensional structure. Prediction of the three-dimensional structure of a protein (tertiary structure) from its amino acid sequence (primary structure) is considered as a challenging task for bioinformatics and molecular biology. Recently, due to tremendous advances in the pattern recognition field, there has been a growing interest in applying classification approaches to tackle the protein fold prediction problem. In this paper, Random Forest, as a kind of ensemble method, is employed to address this problem. The Random Forest, is a recently introduced method based on bagging algorithm that trains a group of base classifiers by randomly selecting sets of features and then, combining results obtained from base classifiers by majority voting. To investigate the effectiveness of the number of base learners to the performance of the Random Forest, twelve different numbers of base classifiers (between 30 and 600) are applied for this classifier. To study the performance of the Random Forest and compare its results with previously reported results, the dataset produced by Ding and Dubchak is used. Our experimental results show that the Random Forest enhances the prediction accuracy (using same set of features proposed by Dubchak et al.) as well as reduces time consumption of the protein fold prediction task, compared to the previous works found in the literature.

Original languageEnglish (US)
Pages (from-to)1941-1956
Number of pages16
JournalJournal of Information Science and Engineering
Issue number6
StatePublished - Nov 2010
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Hardware and Architecture
  • Library and Information Sciences
  • Computational Theory and Mathematics


  • Bagging
  • Bootstrap sampling
  • Classifier ensemble
  • Feature selection
  • Prediction performance
  • Protein fold prediction problem
  • Random forest
  • Random sampling
  • Weak learner


Dive into the research topics of 'Using random forest for protein fold prediction problem: An empirical study'. Together they form a unique fingerprint.

Cite this