Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network. (October 2020)
- Record Type:
- Journal Article
- Title:
- Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network. (October 2020)
- Main Title:
- Analysis of high-dimensional genomic data using MapReduce based probabilistic neural network
- Authors:
- Baliarsingh, Santos Kumar
Vipsita, Swati
Gandomi, Amir H.
Panda, Abhijeet
Bakshi, Sambit
Ramasubbareddy, Somula - Abstract:
- Highlights: A MapReduce-based Fisher score and ReliefF algorithms are proposed for feature selection. A MapReduce-based probabilistic neural network (PNN) is implemented for classification. Weighted chaotic grey wolf optimization algorithm is used to select the optimal value of σ in PNN. Abstract: Background : The size of genomics data has been growing rapidly over the last decade. However, the conventional data analysis techniques are incapable of processing this huge amount of data. For the efficient processing of high dimensional datasets, it is essential to develop some new parallel methods. Methods : In this work, a novel distributed method is presented using Map-Reduce (MR)-based approach. The proposed algorithm consists of MR-based Fisher score (mrFScore), MR-based ReliefF (mrRefiefF), and MR-based probabilistic neural network (mrPNN) using a weighted chaotic grey wolf optimization technique (WCGWO). Here, mrFScore, and mrRefiefF methods are introduced for feature selection (FS), and mrPNN is implemented as an effective method for microarray classification. The proper choice of smoothing parameter ( σ ) plays a major role in the prediction ability of the PNN which is addressed using a novel technique namely, WCGWO. The WCGWO algorithm is used to select the optimal value of σ in PNN. Results : These algorithms have been successfully implemented using the Hadoop framework. The proposed model is tested by using three large and one small microarray datasets, and aHighlights: A MapReduce-based Fisher score and ReliefF algorithms are proposed for feature selection. A MapReduce-based probabilistic neural network (PNN) is implemented for classification. Weighted chaotic grey wolf optimization algorithm is used to select the optimal value of σ in PNN. Abstract: Background : The size of genomics data has been growing rapidly over the last decade. However, the conventional data analysis techniques are incapable of processing this huge amount of data. For the efficient processing of high dimensional datasets, it is essential to develop some new parallel methods. Methods : In this work, a novel distributed method is presented using Map-Reduce (MR)-based approach. The proposed algorithm consists of MR-based Fisher score (mrFScore), MR-based ReliefF (mrRefiefF), and MR-based probabilistic neural network (mrPNN) using a weighted chaotic grey wolf optimization technique (WCGWO). Here, mrFScore, and mrRefiefF methods are introduced for feature selection (FS), and mrPNN is implemented as an effective method for microarray classification. The proper choice of smoothing parameter ( σ ) plays a major role in the prediction ability of the PNN which is addressed using a novel technique namely, WCGWO. The WCGWO algorithm is used to select the optimal value of σ in PNN. Results : These algorithms have been successfully implemented using the Hadoop framework. The proposed model is tested by using three large and one small microarray datasets, and a comparative analysis is carried out with the existing FS and classification techniques. The results suggest that WCGWO-mrPNN can outperform other methods for high dimensional microarray classification. Conclusion : The effectiveness of the proposed methods are compared with other existing schemes. Experimental results reveal that the proposed scheme is accurate and robust. Hence, the suggested scheme is considered to be a reliable framework for microarray data analysis. Significance : Such a method promotes the application of parallel programming using Hadoop cluster for the analysis of large-scale genomics data, particularly when the dataset is of high dimension. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 195(2020)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 195(2020)
- Issue Display:
- Volume 195, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 195
- Issue:
- 2020
- Issue Sort Value:
- 2020-0195-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-10
- Subjects:
- Probabilistic neural network -- Grey wolf optimization -- MapReduce -- ReliefF -- Hadoop -- Fisher score
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2020.105625 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 14021.xml