Abstract
Discovering a three dimensional structure of a protein is a challenging task in biological science. Classifying a protein into one of its folds is an intermediate step for deciphering the three dimensional protein structure. The protein fold recognition can be done by developing feature extraction techniques to accurately extract all the relevant information from a protein sequence and then by employing a suitable classifier to label an unknown protein. Several feature extraction techniques have been developed in the past but with limited recognition accuracy only. In this work, we have developed a feature extraction technique which is based on bi-grams computed directly from Position Specific Scoring Matrices and demonstrated its effectiveness on a benchmark dataset. The proposed technique exhibits an absolute improvement of around 10% compared with existing feature extraction techniques.
Original language | English (US) |
---|---|
Pages (from-to) | 41-46 |
Number of pages | 6 |
Journal | Journal of Theoretical Biology |
Volume | 320 |
DOIs | |
State | Published - Mar 7 2013 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Statistics and Probability
- Modeling and Simulation
- General Biochemistry, Genetics and Molecular Biology
- General Immunology and Microbiology
- General Agricultural and Biological Sciences
- Applied Mathematics
Keywords
- Bi-gram features
- Position specific scoring matrix (PSSM)
- Protein fold recognition
- Protein sequence