Efficient clustering techniques on Hadoop and Spark. (4th June 2019)
- Record Type:
- Journal Article
- Title:
- Efficient clustering techniques on Hadoop and Spark. (4th June 2019)
- Main Title:
- Efficient clustering techniques on Hadoop and Spark
- Authors:
- Ghamdi, Sami Al
Fatta, Giuseppe Di - Abstract:
- Clustering is an essential data mining technique that divides observations into groups where each group contains similar observations. K-means is one of the most popular clustering algorithms that has been used for over 50 years. Due to the current exponential growth of the data, it became a necessity to improve the efficiency and scalability of K-means even further to cope with large-scale datasets known as big data. This paper presents K-means optimisations using triangle inequality on two well-known distributed computing platforms: Hadoop and Spark. K-means variants that use triangle inequality usually require caching extra information from the previous iteration, which is a challenging task to achieve on Hadoop. Hence, this work introduces two methods to pass information from one iteration to the next on Hadoop to accelerate K-means. The experimental work shows that the efficiency of K-means on Hadoop and Spark can be significantly improved by using triangle inequality optimisations.
- Is Part Of:
- International journal of big data intelligence. Volume 6:Number 3/4(2019)
- Journal:
- International journal of big data intelligence
- Issue:
- Volume 6:Number 3/4(2019)
- Issue Display:
- Volume 6, Issue 3/4 (2019)
- Year:
- 2019
- Volume:
- 6
- Issue:
- 3/4
- Issue Sort Value:
- 2019-0006-NaN-0000
- Page Start:
- 269
- Page End:
- 290
- Publication Date:
- 2019-06-04
- Subjects:
- K-means -- Hadoop -- Spark -- MapReduce -- efficient clustering -- triangle inequality K-means
Big data -- Periodicals
005.705 - Journal URLs:
- http://www.inderscience.com/jhome.php?jcode=ijbdi ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 2053-1389
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11026.xml