An Alzheimers disease related genes identification method based on multiple classifier integration

Yu Miao, Huiyan Jiang, Huiling Liu, Yu-Dong Yao

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Background and Objective: Alzheimers disease (AD) is a fatal neurodegenerative disease and the onset of AD is insidious. Full understanding of the AD-related genes (ADGs) has not been completed. The National Center for Biotechnology Information (NCBI) provides an AD dataset of 22,283 genes. Among these genes, 71 genes have been identified as ADGs. But there may still be underlying ADGs that have not yet been identified in the remaining 22,212 genes. This paper aims to identify additional ADGs using machine learning techniques. Methods: To improve the accuracy of ADG identification, we propose a gene identification method through multiple classifier integration. First, a feature selection algorithm is applied to select the most relevant attributes. Second, a two-stage cascading classifier is developed to identify ADGs. The first stage classification task is based on the relevance vector machine and, in the second stage, the results of three classifiers, support vector machine, random forest and extreme learning machine, are combined through voting. Results: According to our results, feature selection improves accuracy and reduces training time. Voting based classifier reduces the classification errors. The proposed ADG identification system provides accuracy, sensitivity and specificity at levels of 78.77%, 83.10% and 74.67%, respectively. Based on the proposed ADG identification method, potentially additional ADGs are identified and top 13 genes (predicted ADGs) are presented. Conclusions: In this paper, an ADG identification method for identifying ADGs is presented. The proposed method which combines feature selection, cascading classifier and majority voting leads to higher specificity and significantly increases the accuracy and sensitivity of ADG identification. Potentially new ADGs are identified.

Original languageEnglish (US)
Pages (from-to)107-115
Number of pages9
JournalComputer Methods and Programs in Biomedicine
Volume150
DOIs
StatePublished - Oct 1 2017

Fingerprint

Alzheimer Disease
Classifiers
Genes
Politics
Feature extraction
Learning systems
Neurodegenerative diseases
Information Centers
Biotechnology
Neurodegenerative Diseases
Support vector machines
Identification (control systems)

All Science Journal Classification (ASJC) codes

  • Software
  • Health Informatics
  • Computer Science Applications

Cite this

@article{6e915971cbfa404e8e9c3968e976fe24,
title = "An Alzheimers disease related genes identification method based on multiple classifier integration",
abstract = "Background and Objective: Alzheimers disease (AD) is a fatal neurodegenerative disease and the onset of AD is insidious. Full understanding of the AD-related genes (ADGs) has not been completed. The National Center for Biotechnology Information (NCBI) provides an AD dataset of 22,283 genes. Among these genes, 71 genes have been identified as ADGs. But there may still be underlying ADGs that have not yet been identified in the remaining 22,212 genes. This paper aims to identify additional ADGs using machine learning techniques. Methods: To improve the accuracy of ADG identification, we propose a gene identification method through multiple classifier integration. First, a feature selection algorithm is applied to select the most relevant attributes. Second, a two-stage cascading classifier is developed to identify ADGs. The first stage classification task is based on the relevance vector machine and, in the second stage, the results of three classifiers, support vector machine, random forest and extreme learning machine, are combined through voting. Results: According to our results, feature selection improves accuracy and reduces training time. Voting based classifier reduces the classification errors. The proposed ADG identification system provides accuracy, sensitivity and specificity at levels of 78.77{\%}, 83.10{\%} and 74.67{\%}, respectively. Based on the proposed ADG identification method, potentially additional ADGs are identified and top 13 genes (predicted ADGs) are presented. Conclusions: In this paper, an ADG identification method for identifying ADGs is presented. The proposed method which combines feature selection, cascading classifier and majority voting leads to higher specificity and significantly increases the accuracy and sensitivity of ADG identification. Potentially new ADGs are identified.",
author = "Yu Miao and Huiyan Jiang and Huiling Liu and Yu-Dong Yao",
year = "2017",
month = "10",
day = "1",
doi = "https://doi.org/10.1016/j.cmpb.2017.08.006",
language = "English (US)",
volume = "150",
pages = "107--115",
journal = "Computer Methods and Programs in Biomedicine",
issn = "0169-2607",
publisher = "Elsevier Ireland Ltd",

}

An Alzheimers disease related genes identification method based on multiple classifier integration. / Miao, Yu; Jiang, Huiyan; Liu, Huiling; Yao, Yu-Dong.

In: Computer Methods and Programs in Biomedicine, Vol. 150, 01.10.2017, p. 107-115.

Research output: Contribution to journalArticle

TY - JOUR

T1 - An Alzheimers disease related genes identification method based on multiple classifier integration

AU - Miao, Yu

AU - Jiang, Huiyan

AU - Liu, Huiling

AU - Yao, Yu-Dong

PY - 2017/10/1

Y1 - 2017/10/1

N2 - Background and Objective: Alzheimers disease (AD) is a fatal neurodegenerative disease and the onset of AD is insidious. Full understanding of the AD-related genes (ADGs) has not been completed. The National Center for Biotechnology Information (NCBI) provides an AD dataset of 22,283 genes. Among these genes, 71 genes have been identified as ADGs. But there may still be underlying ADGs that have not yet been identified in the remaining 22,212 genes. This paper aims to identify additional ADGs using machine learning techniques. Methods: To improve the accuracy of ADG identification, we propose a gene identification method through multiple classifier integration. First, a feature selection algorithm is applied to select the most relevant attributes. Second, a two-stage cascading classifier is developed to identify ADGs. The first stage classification task is based on the relevance vector machine and, in the second stage, the results of three classifiers, support vector machine, random forest and extreme learning machine, are combined through voting. Results: According to our results, feature selection improves accuracy and reduces training time. Voting based classifier reduces the classification errors. The proposed ADG identification system provides accuracy, sensitivity and specificity at levels of 78.77%, 83.10% and 74.67%, respectively. Based on the proposed ADG identification method, potentially additional ADGs are identified and top 13 genes (predicted ADGs) are presented. Conclusions: In this paper, an ADG identification method for identifying ADGs is presented. The proposed method which combines feature selection, cascading classifier and majority voting leads to higher specificity and significantly increases the accuracy and sensitivity of ADG identification. Potentially new ADGs are identified.

AB - Background and Objective: Alzheimers disease (AD) is a fatal neurodegenerative disease and the onset of AD is insidious. Full understanding of the AD-related genes (ADGs) has not been completed. The National Center for Biotechnology Information (NCBI) provides an AD dataset of 22,283 genes. Among these genes, 71 genes have been identified as ADGs. But there may still be underlying ADGs that have not yet been identified in the remaining 22,212 genes. This paper aims to identify additional ADGs using machine learning techniques. Methods: To improve the accuracy of ADG identification, we propose a gene identification method through multiple classifier integration. First, a feature selection algorithm is applied to select the most relevant attributes. Second, a two-stage cascading classifier is developed to identify ADGs. The first stage classification task is based on the relevance vector machine and, in the second stage, the results of three classifiers, support vector machine, random forest and extreme learning machine, are combined through voting. Results: According to our results, feature selection improves accuracy and reduces training time. Voting based classifier reduces the classification errors. The proposed ADG identification system provides accuracy, sensitivity and specificity at levels of 78.77%, 83.10% and 74.67%, respectively. Based on the proposed ADG identification method, potentially additional ADGs are identified and top 13 genes (predicted ADGs) are presented. Conclusions: In this paper, an ADG identification method for identifying ADGs is presented. The proposed method which combines feature selection, cascading classifier and majority voting leads to higher specificity and significantly increases the accuracy and sensitivity of ADG identification. Potentially new ADGs are identified.

UR - http://www.scopus.com/inward/record.url?scp=85027867396&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85027867396&partnerID=8YFLogxK

U2 - https://doi.org/10.1016/j.cmpb.2017.08.006

DO - https://doi.org/10.1016/j.cmpb.2017.08.006

M3 - Article

VL - 150

SP - 107

EP - 115

JO - Computer Methods and Programs in Biomedicine

JF - Computer Methods and Programs in Biomedicine

SN - 0169-2607

ER -