Self-Adaptive K-Means Based on a Covering Algorithm. (1st August 2018)
- Record Type:
- Journal Article
- Title:
- Self-Adaptive K-Means Based on a Covering Algorithm. (1st August 2018)
- Main Title:
- Self-Adaptive K-Means Based on a Covering Algorithm
- Authors:
- Zhang, Yiwen
Zhou, Yuanyuan
Guo, Xing
Wu, Jintao
He, Qiang
Liu, Xiao
Yang, Yun - Other Names:
- Zhang Xiuzhen Academic Editor.
- Abstract:
- Abstract : The K -means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K -means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K -means clustering algorithm called the covering K -means algorithm (C-K -means). The C-K -means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K -means . The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a "blind" feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K -means algorithm combines the advantages of CA and K -means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K -means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracyAbstract : The K -means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K -means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K -means clustering algorithm called the covering K -means algorithm (C-K -means). The C-K -means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K -means . The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a "blind" feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K -means algorithm combines the advantages of CA and K -means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K -means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K -means algorithm outperforms the existing algorithms under both sequential and parallel conditions. … (more)
- Is Part Of:
- Complexity. Volume 2018(2018)
- Journal:
- Complexity
- Issue:
- Volume 2018(2018)
- Issue Display:
- Volume 2018, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 2018
- Issue:
- 2018
- Issue Sort Value:
- 2018-2018-2018-0000
- Page Start:
- Page End:
- Publication Date:
- 2018-08-01
- Subjects:
- Chaotic behavior in systems -- Periodicals
Complexity (Philosophy) -- Periodicals
003 - Journal URLs:
- https://onlinelibrary.wiley.com/journal/10990526 ↗
http://onlinelibrary.wiley.com/ ↗
https://www.hindawi.com/journals/complexity/ ↗ - DOI:
- 10.1155/2018/7698274 ↗
- Languages:
- English
- ISSNs:
- 1076-2787
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3364.585500
British Library HMNTS - ELD Digital store - Ingest File:
- 22600.xml