Classification of gene expression data: A hubness-aware semi-supervised approach. Issue 127 (April 2016)
- Record Type:
- Journal Article
- Title:
- Classification of gene expression data: A hubness-aware semi-supervised approach. Issue 127 (April 2016)
- Main Title:
- Classification of gene expression data: A hubness-aware semi-supervised approach
- Authors:
- Buza, Krisztian
- Abstract:
- Abstract : Highlights: A semi-supervised hubness-aware classifier is proposed. The classifier is evaluated on publicly available real gene expression data. We made the implementation of hubness-aware machine learning techniques available in the PyHubs software package. Abstract: Background and objective: Classification of gene expression data is the common denominator of various biomedical recognition tasks. However, obtaining class labels for large training samples may be difficult or even impossible in many cases. Therefore, semi-supervised classification techniques are required as semi-supervised classifiers take advantage of unlabeled data. Methods: Gene expression data is high-dimensional which gives rise to the phenomena known under the umbrella of the curse of dimensionality, one of its recently explored aspects being the presence of hubs or hubness for short. Therefore, hubness-aware classifiers have been developed recently, such as Naive Hubness-Bayesian k -Nearest Neighbor (NHBNN). In this paper, we propose a semi-supervised extension of NHBNN which follows the self-training schema. As one of the core components of self-training is the certainty score, we propose a new hubness-aware certainty score. Results: We performed experiments on publicly available gene expression data. These experiments show that the proposed classifier outperforms its competitors. We investigated the impact of each of the components (classification algorithm, semi-supervised technique,Abstract : Highlights: A semi-supervised hubness-aware classifier is proposed. The classifier is evaluated on publicly available real gene expression data. We made the implementation of hubness-aware machine learning techniques available in the PyHubs software package. Abstract: Background and objective: Classification of gene expression data is the common denominator of various biomedical recognition tasks. However, obtaining class labels for large training samples may be difficult or even impossible in many cases. Therefore, semi-supervised classification techniques are required as semi-supervised classifiers take advantage of unlabeled data. Methods: Gene expression data is high-dimensional which gives rise to the phenomena known under the umbrella of the curse of dimensionality, one of its recently explored aspects being the presence of hubs or hubness for short. Therefore, hubness-aware classifiers have been developed recently, such as Naive Hubness-Bayesian k -Nearest Neighbor (NHBNN). In this paper, we propose a semi-supervised extension of NHBNN which follows the self-training schema. As one of the core components of self-training is the certainty score, we propose a new hubness-aware certainty score. Results: We performed experiments on publicly available gene expression data. These experiments show that the proposed classifier outperforms its competitors. We investigated the impact of each of the components (classification algorithm, semi-supervised technique, hubness-aware certainty score) separately and showed that each of these components are relevant to the performance of the proposed approach. Conclusions: Our results imply that our approach may increase classification accuracy and reduce computational costs (i.e., runtime). Based on the promising results presented in the paper, we envision that hubness-aware techniques will be used in various other biomedical machine learning tasks. In order to accelerate this process, we made an implementation of hubness-aware machine learning techniques publicly available in the PyHubs software package (http://www.biointelligence.hu/pyhubs ) implemented in Python, one of the most popular programming languages of data science. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Issue 127(2016)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Issue 127(2016)
- Issue Display:
- Volume 127, Issue 127 (2016)
- Year:
- 2016
- Volume:
- 127
- Issue:
- 127
- Issue Sort Value:
- 2016-0127-0127-0000
- Page Start:
- 105
- Page End:
- 113
- Publication Date:
- 2016-04
- Subjects:
- Gene expression -- Machine learning -- Semi-supervised classification -- High dimensionality
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2016.01.016 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1846.xml