Clustering ensemble selection for categorical data based on internal validity indices. (September 2017)
- Record Type:
- Journal Article
- Title:
- Clustering ensemble selection for categorical data based on internal validity indices. (September 2017)
- Main Title:
- Clustering ensemble selection for categorical data based on internal validity indices
- Authors:
- Zhao, Xingwang
Liang, Jiye
Dang, Chuangyin - Abstract:
- Highlights: Propose a clustering ensemble selection algorithm for categorical data(SIVID). SIVID measures the quality of base clusterings with internal validity indices. SIVD measures the diversity of base clusterings with NMI. Experimental results show the effectiveness and robustness of the proposed algorithm. Abstract: Clustering ensemble selection is an effective technique for improving the quality of clustering results. However, traditional methods usually measure the quality and diversity based on the cluster labels of base clusterings while missing the information of the original data. To solve this problem, a new clustering ensemble selection algorithm for categorical data is presented. In this algorithm, five popular internal validity indices and the normalized mutual information are utilized to measure the quality and diversity of the base clusterings, respectively. According to the quality measure, the partition with the highest value is firstly selected to participate in the ensemble. Then, the base partitions with the highest clustering quality and diversity with respect to the selected base partitions in previous iterations are iteratively selected, until the size of selected base clusterings is satisfied. The effectiveness and robustness of the proposed algorithm are evaluated in comparison with full ensemble, random selection ensemble and the state-of-the-art ensemble selection algorithms. Experimental results on real categorical data sets show that theHighlights: Propose a clustering ensemble selection algorithm for categorical data(SIVID). SIVID measures the quality of base clusterings with internal validity indices. SIVD measures the diversity of base clusterings with NMI. Experimental results show the effectiveness and robustness of the proposed algorithm. Abstract: Clustering ensemble selection is an effective technique for improving the quality of clustering results. However, traditional methods usually measure the quality and diversity based on the cluster labels of base clusterings while missing the information of the original data. To solve this problem, a new clustering ensemble selection algorithm for categorical data is presented. In this algorithm, five popular internal validity indices and the normalized mutual information are utilized to measure the quality and diversity of the base clusterings, respectively. According to the quality measure, the partition with the highest value is firstly selected to participate in the ensemble. Then, the base partitions with the highest clustering quality and diversity with respect to the selected base partitions in previous iterations are iteratively selected, until the size of selected base clusterings is satisfied. The effectiveness and robustness of the proposed algorithm are evaluated in comparison with full ensemble, random selection ensemble and the state-of-the-art ensemble selection algorithms. Experimental results on real categorical data sets show that the proposed algorithm is competitive with the existing ensemble selection algorithms in terms of clustering quality. … (more)
- Is Part Of:
- Pattern recognition. Volume 69(2017:Sep.)
- Journal:
- Pattern recognition
- Issue:
- Volume 69(2017:Sep.)
- Issue Display:
- Volume 69 (2017)
- Year:
- 2017
- Volume:
- 69
- Issue Sort Value:
- 2017-0069-0000-0000
- Page Start:
- 150
- Page End:
- 168
- Publication Date:
- 2017-09
- Subjects:
- Clustering ensemble selection -- Categorical data -- Clustering validity indices -- Quality -- Diversity
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2017.04.019 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2641.xml