A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. Issue 131 (July 2016)
- Record Type:
- Journal Article
- Title:
- A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset. Issue 131 (July 2016)
- Main Title:
- A MapReduce approach to diminish imbalance parameters for big deoxyribonucleic acid dataset
- Authors:
- Kamal, Sarwar
Ripon, Shamim Hasnat
Dey, Nilanjan
Ashour, Amira S.
Santhi, V. - Abstract:
- Highlights: Imbalanced data sets are considered a special case for the classification problems. Map reducing with prototype reduction can easily handle large scale data set with good speedup and less time consuming. For high quality, four reduction types: data cleaning, rule aggregation, rule synthesis and rule update were compared. A real DNA dataset consists of 90 million pair has been used with reduction types. The proposed MapReduce based K-NN classifier reduced the imbalance data set and achieved accurate results for the DNA data. Abstract: Background: In the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential. Method: In this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor ( K- NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements orHighlights: Imbalanced data sets are considered a special case for the classification problems. Map reducing with prototype reduction can easily handle large scale data set with good speedup and less time consuming. For high quality, four reduction types: data cleaning, rule aggregation, rule synthesis and rule update were compared. A real DNA dataset consists of 90 million pair has been used with reduction types. The proposed MapReduce based K-NN classifier reduced the imbalance data set and achieved accurate results for the DNA data. Abstract: Background: In the age of information superhighway, big data play a significant role in information processing, extractions, retrieving and management. In computational biology, the continuous challenge is to manage the biological data. Data mining techniques are sometimes imperfect for new space and time requirements. Thus, it is critical to process massive amounts of data to retrieve knowledge. The existing software and automated tools to handle big data sets are not sufficient. As a result, an expandable mining technique that enfolds the large storage and processing capability of distributed or parallel processing platforms is essential. Method: In this analysis, a contemporary distributed clustering methodology for imbalance data reduction using k-nearest neighbor ( K- NN) classification approach has been introduced. The pivotal objective of this work is to illustrate real training data sets with reduced amount of elements or instances. These reduced amounts of data sets will ensure faster data classification and standard storage management with less sensitivity. However, general data reduction methods cannot manage very big data sets. To minimize these difficulties, a MapReduce-oriented framework is designed using various clusters of automated contents, comprising multiple algorithmic approaches. Results: To test the proposed approach, a real DNA (deoxyribonucleic acid) dataset that consists of 90 million pairs has been used. The proposed model reduces the imbalance data sets from large-scale data sets without loss of its accuracy. Conclusions: The obtained results depict that MapReduce based K-NN classifier provided accurate results for big data of DNA. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Issue 131(2016)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Issue 131(2016)
- Issue Display:
- Volume 131, Issue 131 (2016)
- Year:
- 2016
- Volume:
- 131
- Issue:
- 131
- Issue Sort Value:
- 2016-0131-0131-0000
- Page Start:
- 191
- Page End:
- 206
- Publication Date:
- 2016-07
- Subjects:
- MapReduce -- K-nearest neighbor -- Big data -- DNA (deoxyribonucleic acid) -- Computational biology -- Imbalance data
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2016.04.005 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2093.xml