Iterative big data clustering algorithms: a review. (7th July 2015)
- Record Type:
- Journal Article
- Title:
- Iterative big data clustering algorithms: a review. (7th July 2015)
- Main Title:
- Iterative big data clustering algorithms: a review
- Authors:
- Mohebi, Amin
Aghabozorgi, Saeed
Ying Wah, Teh
Herawan, Tutut
Yahyapour, Ramin - Abstract:
- Summary: Enterprises today are dealing with the massive size of data, which have been explosively increasing. The key requirements to address this challenge are to extract, analyze, and process data in a timely manner. Clustering is an essential data mining tool that plays an important role for analyzing big data. However, large‐scale data clustering has become a challenging task because of the large amount of information that emerges from technological progress in many areas, including finance and business informatics. Accordingly, researchers have dealt with parallel clustering algorithms using parallel programming models to address this issue. MapReduce is one of the most famous frameworks, and it has attracted great attention because of its flexibility, ease of programming, and fault tolerance. However, the framework has evident performance limitations, especially for iterative programs. This study will first review the proposed iterative frameworks that extended MapReduce to support iterative algorithms. We summarize these techniques, discuss their uniqueness and limitations, and explain how they address the challenging issues of iterative programs. We also perform an in‐depth review to understand the problems and the solving techniques for parallel clustering algorithms. Hence, we believe that no well‐rounded review provides a significant comparison among parallel clustering algorithms using MapReduce. This work aims to serve as a stepping stone for researchers who areSummary: Enterprises today are dealing with the massive size of data, which have been explosively increasing. The key requirements to address this challenge are to extract, analyze, and process data in a timely manner. Clustering is an essential data mining tool that plays an important role for analyzing big data. However, large‐scale data clustering has become a challenging task because of the large amount of information that emerges from technological progress in many areas, including finance and business informatics. Accordingly, researchers have dealt with parallel clustering algorithms using parallel programming models to address this issue. MapReduce is one of the most famous frameworks, and it has attracted great attention because of its flexibility, ease of programming, and fault tolerance. However, the framework has evident performance limitations, especially for iterative programs. This study will first review the proposed iterative frameworks that extended MapReduce to support iterative algorithms. We summarize these techniques, discuss their uniqueness and limitations, and explain how they address the challenging issues of iterative programs. We also perform an in‐depth review to understand the problems and the solving techniques for parallel clustering algorithms. Hence, we believe that no well‐rounded review provides a significant comparison among parallel clustering algorithms using MapReduce. This work aims to serve as a stepping stone for researchers who are studying big data clustering algorithms. Copyright © 2015 John Wiley & Sons, Ltd. … (more)
- Is Part Of:
- Software, practice & experience. Volume 46:Number 1(2016)
- Journal:
- Software, practice & experience
- Issue:
- Volume 46:Number 1(2016)
- Issue Display:
- Volume 46, Issue 1 (2016)
- Year:
- 2016
- Volume:
- 46
- Issue:
- 1
- Issue Sort Value:
- 2016-0046-0001-0000
- Page Start:
- 107
- Page End:
- 129
- Publication Date:
- 2015-07-07
- Subjects:
- big data -- large‐scale -- MapReduce -- clustering -- Hadoop
Computer software -- Periodicals
Computer programming -- Periodicals
Computer programs -- Periodicals
005.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/spe.2341 ↗
- Languages:
- English
- ISSNs:
- 0038-0644
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8321.453000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 9865.xml