Soft subspace clustering of categorical data with probabilistic distance. (March 2016)
- Record Type:
- Journal Article
- Title:
- Soft subspace clustering of categorical data with probabilistic distance. (March 2016)
- Main Title:
- Soft subspace clustering of categorical data with probabilistic distance
- Authors:
- Chen, Lifei
Wang, Shengrui
Wang, Kaijun
Zhu, Jianping - Abstract:
- Abstract: Categorical data clustering is an important subject in pattern recognition. Currently, subspace clustering of categorical data remains an open problem due to the difficulties in estimating attribute interestingness according to the statistics of categories in clusters. In this paper, a new algorithm is proposed for clustering categorical data with a novel soft feature-selection scheme, by which each categorical attribute is automatically assigned a weight that correlates with the smoothed dispersion of the categories in a cluster. In the proposed algorithm, dissimilarity between categorical data objects is measured using a probabilistic distance function, based on kernel density estimation for categorical attributes. We also make use of the probabilistic distances to define a cluster validity index for estimating the number of categorical clusters. The suitability of the proposal is demonstrated in an empirical study done with some widely used real-world data sets and synthetic data sets, and the results show its outstanding performance. Abstract : Highlights: We define the cluster scatter on object-to-cluster distances for categorical data. We propose a probabilistic distance function using a kernel density estimation method. Categorical attributes are weighted based on the smoothed dispersion of categories. Two weighting schemes are offered depending on the attribute types. Significantly improve clustering performance compared to mode-based algorithms.
- Is Part Of:
- Pattern recognition. Volume 51(2016:Mar.)
- Journal:
- Pattern recognition
- Issue:
- Volume 51(2016:Mar.)
- Issue Display:
- Volume 51 (2016)
- Year:
- 2016
- Volume:
- 51
- Issue Sort Value:
- 2016-0051-0000-0000
- Page Start:
- 322
- Page End:
- 332
- Publication Date:
- 2016-03
- Subjects:
- Subspace clustering -- Categorical data -- Distance measure -- Attribute weighting -- Kernel density estimation
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2015.09.027 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 59.xml