On strategies for building effective ensembles of relative clustering validity criteria. Issue 2 (May 2016)
- Record Type:
- Journal Article
- Title:
- On strategies for building effective ensembles of relative clustering validity criteria. Issue 2 (May 2016)
- Main Title:
- On strategies for building effective ensembles of relative clustering validity criteria
- Authors:
- Jaskowiak, Pablo
Moulavi, Davoud
Furtado, Antonio
Campello, Ricardo
Zimek, Arthur
Sander, Jörg - Abstract:
- Abstract Evaluation and validation are essential tasks for achieving meaningful clustering results. Relative validity criteria are measures usually employed in practice to select and validate clustering solutions, as they enable the evaluation of single partitions and the comparison of partition pairs in relative terms based only on the data under analysis. There is a plethora of relative validity measures described in the clustering literature, thus making it difficult to choose an appropriate measure for a given application. One reason for such a variety is that no single measure can capture all different aspects of the clustering problem and, as such, each of them is prone to fail in particular application scenarios. In the present work, we take advantage of the diversity in relative validity measures from the clustering literature. Previous work showed that when randomly selecting different relative validity criteria for an ensemble (from an initial set of 28 different measures), one can expect with great certainty to only improve results over the worst criterion included in the ensemble. In this paper, we propose a method for selecting measures with minimum effectiveness and some degree of complementarity (from the same set of 28 measures) into ensembles, which show superior performance when compared to any single ensemble member (and not just the worst one) over a variety of different datasets. One can also expect greater stability in terms of evaluation over differentAbstract Evaluation and validation are essential tasks for achieving meaningful clustering results. Relative validity criteria are measures usually employed in practice to select and validate clustering solutions, as they enable the evaluation of single partitions and the comparison of partition pairs in relative terms based only on the data under analysis. There is a plethora of relative validity measures described in the clustering literature, thus making it difficult to choose an appropriate measure for a given application. One reason for such a variety is that no single measure can capture all different aspects of the clustering problem and, as such, each of them is prone to fail in particular application scenarios. In the present work, we take advantage of the diversity in relative validity measures from the clustering literature. Previous work showed that when randomly selecting different relative validity criteria for an ensemble (from an initial set of 28 different measures), one can expect with great certainty to only improve results over the worst criterion included in the ensemble. In this paper, we propose a method for selecting measures with minimum effectiveness and some degree of complementarity (from the same set of 28 measures) into ensembles, which show superior performance when compared to any single ensemble member (and not just the worst one) over a variety of different datasets. One can also expect greater stability in terms of evaluation over different datasets, even when considering different ensemble strategies. Our results are based on more than a thousand datasets, synthetic and real, from different sources. … (more)
- Is Part Of:
- Knowledge and information systems. Volume 47:Issue 2(2016:May)
- Journal:
- Knowledge and information systems
- Issue:
- Volume 47:Issue 2(2016:May)
- Issue Display:
- Volume 47, Issue 2 (2016)
- Year:
- 2016
- Volume:
- 47
- Issue:
- 2
- Issue Sort Value:
- 2016-0047-0002-0000
- Page Start:
- 329
- Page End:
- 354
- Publication Date:
- 2016-05
- Subjects:
- Clustering -- Clustering validation -- Relative validity criteria -- Relative validity indices -- Ensemble -- Combination -- Aggregation
Expert systems (Computer science) -- Periodicals
Information storage and retrieval systems -- Periodicals
006.33 - Journal URLs:
- http://link.springer-ny.com/link/service/journals/10115/index.htm ↗
http://www.springerlink.com/content/0219-1377 ↗
http://www.springer.com/gb/ ↗ - DOI:
- 10.1007/s10115-015-0851-6 ↗
- Languages:
- English
- ISSNs:
- 0219-1377
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5100.437300
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9889.xml