Scalable clustering by aggregating representatives in hierarchical groups. (April 2023)
- Record Type:
- Journal Article
- Title:
- Scalable clustering by aggregating representatives in hierarchical groups. (April 2023)
- Main Title:
- Scalable clustering by aggregating representatives in hierarchical groups
- Authors:
- Xie, Wen-Bo
Liu, Zhen
Das, Debarati
Chen, Bin
Srivastava, Jaideep - Abstract:
- Highlights: A scalable hierarchical clustering framework for discrete data segments is proposed. Multiple indices are proposed to effectively spot the most representative node in each sub-MST. Swapping data points in overlap areas between pairwise sub-MSTs ensures the accuracy of the hierarchical clustering across multiple data segments. The proposed model provides a good balance between accuracy and scalability of hierarchical clustering. Abstract: Appropriately handling the scalability of clustering is a long-standing challenge for the study of clustering techniques and is of fundamental interest to researchers in the community of data mining and knowledge discovery. In comparison to other clustering methods, hierarchical clustering demonstrates better interpretability of clustering results but poor scalability while handling large-scale data. Thus, more comprehensive studies on this problem need to be conducted. This paper develops a new scalable hierarchical clustering model called Election Tree, which can detect the most representative point for each sub-cluster via the process of node election in split data and adjust the members in sub-clusters by the operations of node merging and swap. Extensive experiments on real-world datasets reveal that the proposed computational framework has better clustering accuracy as opposed to the competing baseline methods. Meanwhile, with respect to the scalability tests on incremental synthetic datasets, the results show that the newHighlights: A scalable hierarchical clustering framework for discrete data segments is proposed. Multiple indices are proposed to effectively spot the most representative node in each sub-MST. Swapping data points in overlap areas between pairwise sub-MSTs ensures the accuracy of the hierarchical clustering across multiple data segments. The proposed model provides a good balance between accuracy and scalability of hierarchical clustering. Abstract: Appropriately handling the scalability of clustering is a long-standing challenge for the study of clustering techniques and is of fundamental interest to researchers in the community of data mining and knowledge discovery. In comparison to other clustering methods, hierarchical clustering demonstrates better interpretability of clustering results but poor scalability while handling large-scale data. Thus, more comprehensive studies on this problem need to be conducted. This paper develops a new scalable hierarchical clustering model called Election Tree, which can detect the most representative point for each sub-cluster via the process of node election in split data and adjust the members in sub-clusters by the operations of node merging and swap. Extensive experiments on real-world datasets reveal that the proposed computational framework has better clustering accuracy as opposed to the competing baseline methods. Meanwhile, with respect to the scalability tests on incremental synthetic datasets, the results show that the new model has a significantly lower time consumption than the state-of-the-art hierarchical clustering models such as PERCH, GRINCH, SCC and other classic baselines. … (more)
- Is Part Of:
- Pattern recognition. Volume 136(2023)
- Journal:
- Pattern recognition
- Issue:
- Volume 136(2023)
- Issue Display:
- Volume 136, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 136
- Issue:
- 2023
- Issue Sort Value:
- 2023-0136-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-04
- Subjects:
- Hierarchical clustering -- Election tree -- Representative node -- Root
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2022.109230 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25681.xml