Cludoop: An Efficient Distributed Density-Based Clustering for Big Data Using Hadoop. (1st June 2015)
- Record Type:
- Journal Article
- Title:
- Cludoop: An Efficient Distributed Density-Based Clustering for Big Data Using Hadoop. (1st June 2015)
- Main Title:
- Cludoop: An Efficient Distributed Density-Based Clustering for Big Data Using Hadoop
- Authors:
- Yu, Yanwei
Zhao, Jindong
Wang, Xiaodong
Wang, Qin
Zhang, Yonggang - Other Names:
- Rao Praveen Academic Editor.
- Abstract:
- Abstract : Density-based clustering for big data is critical for many modern applications ranging from Internet data processing to massive-scale moving object management. This paper proposes Cludoop algorithm, an efficient distributed density-based clustering for big data using Hadoop. First, we propose a serial clustering algorithm CluC by leveraging cell partition optimization and c-cluster to fast find clusters. CluC completes classification of the points using the relationships of connected cells around points instead of expensive completed neighbor query, which significantly reduce the number of distance calculations. Second, we propose the Cludoop, which can efficiently cluster very-large-scale data in parallel using already existing data partition on Map/Reduce platform. It employs the proposed serial clustering CluC as a plugged-in clustering on parallel mapper, along with a cell description instead of completed cell in transmission to reduce both network and I/O costs. Guided by proposed cell-based principles, we also design a Merging-Refinement-Merging 3-step framework to merge c-clusters on the overlay of assigned preclustering result on reducer. Finally, our comprehensive experimental evaluation on 10 network-connected commercial PCs, using both huge-volume real and synthetic data, demonstrates (1) the effectiveness of our algorithm in finding correct clusters with arbitrary shape and (2) the fact that our proposed algorithm exhibits better scalability andAbstract : Density-based clustering for big data is critical for many modern applications ranging from Internet data processing to massive-scale moving object management. This paper proposes Cludoop algorithm, an efficient distributed density-based clustering for big data using Hadoop. First, we propose a serial clustering algorithm CluC by leveraging cell partition optimization and c-cluster to fast find clusters. CluC completes classification of the points using the relationships of connected cells around points instead of expensive completed neighbor query, which significantly reduce the number of distance calculations. Second, we propose the Cludoop, which can efficiently cluster very-large-scale data in parallel using already existing data partition on Map/Reduce platform. It employs the proposed serial clustering CluC as a plugged-in clustering on parallel mapper, along with a cell description instead of completed cell in transmission to reduce both network and I/O costs. Guided by proposed cell-based principles, we also design a Merging-Refinement-Merging 3-step framework to merge c-clusters on the overlay of assigned preclustering result on reducer. Finally, our comprehensive experimental evaluation on 10 network-connected commercial PCs, using both huge-volume real and synthetic data, demonstrates (1) the effectiveness of our algorithm in finding correct clusters with arbitrary shape and (2) the fact that our proposed algorithm exhibits better scalability and efficiency than state-of-the-art method. … (more)
- Is Part Of:
- International journal of distributed sensor networks. (2015)
- Journal:
- International journal of distributed sensor networks
- Issue:
- (2015)
- Issue Display:
- Volume 2015, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 2015
- Issue:
- 2015
- Issue Sort Value:
- 2015-2015-2015-0000
- Page Start:
- Page End:
- Publication Date:
- 2015-06-01
- Subjects:
- Sensor networks -- Periodicals
Intelligent agents (Computer software) -- Periodicals
Multisensor data fusion -- Periodicals
681.2 - Journal URLs:
- http://www.informaworld.com/smpp/title~content=t714578688~db=all ↗
http://www.metapress.com/openurl.asp?genre=journal&issn=1550-1329 ↗
http://dsn.sagepub.com/ ↗
http://www.tandfonline.com/ ↗ - DOI:
- 10.1155/2015/579391 ↗
- Languages:
- English
- ISSNs:
- 1550-1329
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4542.186400
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22830.xml