Combining K-Means and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering. (January 2018)
- Record Type:
- Journal Article
- Title:
- Combining K-Means and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering. (January 2018)
- Main Title:
- Combining K-Means and a genetic algorithm through a novel arrangement of genetic operators for high quality clustering
- Authors:
- Islam, Md Zahidul
Estivill-Castro, Vladimir
Rahman, Md Anisur
Bossomaier, Terry - Abstract:
- Highlights: CombiningK-Means with a Genetic Algorithm in multiple stages. New genetic operators. Use of a short lengthK-Means to quickly repair the chromosomes. Extensive experiments on 18 data sets and a few clustering techniques. Abstract: Knowledge discovery from data can be broadly categorized into two types: supervised and unsupervised. A supervised knowledge discovery process such as classification by decision trees typically requires class labels which are sometimes unavailable in datasets. Unsupervised knowledge discovery techniques such as an unsupervised clustering technique can handle datasets without class labels. They aim to let data reveal the groups (i.e. the data elements in each group) and the number of groups. For the ubiquitous task of clustering, K-Means is the most used algorithm applied in a broad range of areas to identify groups where intra-group distances are much smaller than inter-group distances. As a representative-based clustering approach, K-Means offers an extremely efficient gradient descent approach to the total squared error of representation; however, it not only demands the parameter k, but it also makes assumptions about the similarity of density among the clusters. Therefore, it is profoundly affected by noise. Perhaps more seriously, it can often be attracted to local optima despite its immersion in a multi-start scheme. We present an effective genetic algorithm that combines the capacity of genetic operators to conglomerate differentHighlights: CombiningK-Means with a Genetic Algorithm in multiple stages. New genetic operators. Use of a short lengthK-Means to quickly repair the chromosomes. Extensive experiments on 18 data sets and a few clustering techniques. Abstract: Knowledge discovery from data can be broadly categorized into two types: supervised and unsupervised. A supervised knowledge discovery process such as classification by decision trees typically requires class labels which are sometimes unavailable in datasets. Unsupervised knowledge discovery techniques such as an unsupervised clustering technique can handle datasets without class labels. They aim to let data reveal the groups (i.e. the data elements in each group) and the number of groups. For the ubiquitous task of clustering, K-Means is the most used algorithm applied in a broad range of areas to identify groups where intra-group distances are much smaller than inter-group distances. As a representative-based clustering approach, K-Means offers an extremely efficient gradient descent approach to the total squared error of representation; however, it not only demands the parameter k, but it also makes assumptions about the similarity of density among the clusters. Therefore, it is profoundly affected by noise. Perhaps more seriously, it can often be attracted to local optima despite its immersion in a multi-start scheme. We present an effective genetic algorithm that combines the capacity of genetic operators to conglomerate different solutions of the search space with the exploitation of the hill-climber. We advance a previous genetic-searching approach calledGenClust, with the intervention of fast hill-climbing cycles ofK-Means and obtain an algorithm that is faster than its predecessor and achieves clustering results of higher quality. We demonstrate this across a series of 18 commonly researched datasets. … (more)
- Is Part Of:
- Expert systems with applications. Volume 91(2018)
- Journal:
- Expert systems with applications
- Issue:
- Volume 91(2018)
- Issue Display:
- Volume 91, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 91
- Issue:
- 2018
- Issue Sort Value:
- 2018-0091-2018-0000
- Page Start:
- 402
- Page End:
- 417
- Publication Date:
- 2018-01
- Subjects:
- Clustering -- Genetic algorithm -- K-Means -- Data mining -- Cluster evaluation
Expert systems (Computer science) -- Periodicals
Systèmes experts (Informatique) -- Périodiques
Electronic journals
006.33 - Journal URLs:
- http://www.sciencedirect.com/science/journal/09574174 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.eswa.2017.09.005 ↗
- Languages:
- English
- ISSNs:
- 0957-4174
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3842.004220
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 4747.xml