Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification. (October 2016)
- Record Type:
- Journal Article
- Title:
- Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification. (October 2016)
- Main Title:
- Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification
- Authors:
- Jowkar, Gholam-Hossein
Mansoori, Eghbal G. - Abstract:
- Graphical abstract: Highlights: Identification of disease genes in semi-supervised learning methods, called positive-unlabeled learning. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks. A Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k -nearest neighbor and decision tree. The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12, 950 disease genes with 949 positive genes from six class of diseases and 12, 001 unlabeled genes. Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance. Abstract: Identification of disease genes, using computational methods, is an important issue in biomedical and bioinformatics research. According to observations that diseases with the same or similar phenotype have the same biological characteristics, researchers have tried to identify genes by using machine learning tools. In recent attempts, some semi-supervised learning methods, called positive-unlabeled learning, is used for disease geneGraphical abstract: Highlights: Identification of disease genes in semi-supervised learning methods, called positive-unlabeled learning. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks. A Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k -nearest neighbor and decision tree. The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12, 950 disease genes with 949 positive genes from six class of diseases and 12, 001 unlabeled genes. Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance. Abstract: Identification of disease genes, using computational methods, is an important issue in biomedical and bioinformatics research. According to observations that diseases with the same or similar phenotype have the same biological characteristics, researchers have tried to identify genes by using machine learning tools. In recent attempts, some semi-supervised learning methods, called positive-unlabeled learning, is used for disease gene identification. In this paper, we present a Perceptron ensemble of graph-based positive-unlabeled learning (PEGPUL) on three types of biological attributes: gene ontologies, protein domains and protein-protein interaction networks. In our method, a reliable set of positive and negative genes are extracted using co-training schema. Then, the similarity graph of genes is built using metric learning by concentrating on multi-rank-walk method to perform inference from labeled genes. At last, a Perceptron ensemble is learned from three weighted classifiers: multilevel support vector machine, k -nearest neighbor and decision tree. The main contributions of this paper are: (i) incorporating the statistical properties of gene data through choosing proper metrics, (ii) statistical evaluation of biological features, and (iii) noise robustness characteristic of PEGPUL via using multilevel schema. In order to assess PEGPUL, we have applied it on 12950 disease genes with 949 positive genes from six class of diseases and 12001 unlabeled genes. Compared with some popular disease gene identification methods, the experimental results show that PEGPUL has reasonable performance. … (more)
- Is Part Of:
- Computational biology and chemistry. Volume 64(2016)
- Journal:
- Computational biology and chemistry
- Issue:
- Volume 64(2016)
- Issue Display:
- Volume 64, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 64
- Issue:
- 2016
- Issue Sort Value:
- 2016-0064-2016-0000
- Page Start:
- 263
- Page End:
- 270
- Publication Date:
- 2016-10
- Subjects:
- Disease gene identification -- Biological networks -- Positive-unlabeled learning -- Ensemble of classifiers -- Perceptron
Chemistry -- Data processing -- Periodicals
Biology -- Data processing -- Periodicals
Biochemistry -- Data processing
Biology -- Data processing
Molecular biology -- Data processing
Periodicals
Electronic journals
542.85 - Journal URLs:
- http://www.sciencedirect.com/science/journal/14769271 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compbiolchem.2016.07.004 ↗
- Languages:
- English
- ISSNs:
- 1476-9271
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3390.576700
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 7371.xml