An improved algorithm for partial clustering. (1st May 2019)
- Record Type:
- Journal Article
- Title:
- An improved algorithm for partial clustering. (1st May 2019)
- Main Title:
- An improved algorithm for partial clustering
- Authors:
- Melendez-Melendez, G.
Cruz-Paz, D.
Carrasco-Ochoa, J.A.
Martínez-Trinidad, José Fco. - Abstract:
- Highlights: Outlier detection is important for improving clustering results. Over detection of outliers leads to information loss. Our proposal reduces the number of over-detected outliers. Experiments show that clustering quality can be improved, while runtime is reduced. Abstract: Expert and intelligent systems use a variety of machine learning techniques to obtain and understand the information inherent in the data. Clustering is one of these techniques, which has become important and popular since it allows classifying an unlabeled dataset into clusters of similar objects. There are many clustering algorithms that have been proposed in the literature. From these algorithms, the Cross-Clustering algorithm is one of the most recent clustering algorithms for partial clustering (clustering where not necessarily all the objects are grouped into clusters), which has provided good results allowing estimating a suitable set of clusters, as well as eliminating outliers. However, this algorithm tends to eliminate too many objects as outliers, which leads to discard a lot of non-outlier objects. Additionally, the Cross-Clustering algorithms spends a lot of time evaluating several combinations of clusterings, trying to determine a suitable number of clusters. To overcome these problems, in this paper, an improved version of the Cross-Clustering algorithm (ICC) is proposed. ICC changes the clustering algorithm used for detecting outliers, as well as it modifies the way outliers areHighlights: Outlier detection is important for improving clustering results. Over detection of outliers leads to information loss. Our proposal reduces the number of over-detected outliers. Experiments show that clustering quality can be improved, while runtime is reduced. Abstract: Expert and intelligent systems use a variety of machine learning techniques to obtain and understand the information inherent in the data. Clustering is one of these techniques, which has become important and popular since it allows classifying an unlabeled dataset into clusters of similar objects. There are many clustering algorithms that have been proposed in the literature. From these algorithms, the Cross-Clustering algorithm is one of the most recent clustering algorithms for partial clustering (clustering where not necessarily all the objects are grouped into clusters), which has provided good results allowing estimating a suitable set of clusters, as well as eliminating outliers. However, this algorithm tends to eliminate too many objects as outliers, which leads to discard a lot of non-outlier objects. Additionally, the Cross-Clustering algorithms spends a lot of time evaluating several combinations of clusterings, trying to determine a suitable number of clusters. To overcome these problems, in this paper, an improved version of the Cross-Clustering algorithm (ICC) is proposed. ICC changes the clustering algorithm used for detecting outliers, as well as it modifies the way outliers are detected. Moreover, a stop criterion allowing to make a fast decision on the estimation of a suitable number of cluster, is also introduced. The performance of the improved Cross-Clustering algorithm is compared with the original algorithm on artificial and real datasets. Our results show that ICC improves the original algorithm and other state of the art clustering algorithms; in both, runtime and clustering quality. … (more)
- Is Part Of:
- Expert systems with applications. Volume 121(2019)
- Journal:
- Expert systems with applications
- Issue:
- Volume 121(2019)
- Issue Display:
- Volume 121, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 121
- Issue:
- 2019
- Issue Sort Value:
- 2019-0121-2019-0000
- Page Start:
- 282
- Page End:
- 291
- Publication Date:
- 2019-05-01
- Subjects:
- Clustering -- Estimation of the number of clusters -- Outlier detection
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2018.12.027 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9383.xml