A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure. Issue 11 (18th November 2022)
- Record Type:
- Journal Article
- Title:
- A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure. Issue 11 (18th November 2022)
- Main Title:
- A comparison of central‐tendency and interconnectivity approaches to clustering multivariate data with irregular structure
- Authors:
- Tozer, Mark
Keith, David - Abstract:
- Abstract: Questions: Most clustering methods assume data are structured as discrete hyperspheroidal clusters to be evaluated by measures of central tendency. If vegetation data do not conform to this model, then vegetation data may be clustered incorrectly. What are the implications for cluster stability and evaluation if clusters are of irregular shape or density? Location: Southeast Australia. Methods: We define misplacement as the placement of a sample in a cluster other than (distinct from) its nearest neighbor and hypothesize that optimizing homogeneity incurs the cost of higher rates of misplacement. Chameleon is a graph‐theoretic algorithm that emphasizes interconnectivity and thus is sensitive to the shape and distribution of clusters. We contrasted its solutions with those of traditional nonhierarchical and hierarchical (agglomerative and divisive) approaches. Results: Chameleon‐derived solutions had lower rates of misplacement and only marginally higher heterogeneity than those of k‐means in the range of 15–60 clusters, but their metrics converged with larger numbers of clusters. Solutions derived by agglomerative clustering had the best metrics (and divisive clustering the worst) but both produced inferior high‐level solutions to those of Chameleon by merging distantly‐related clusters. Conclusions: Graph‐theoretic algorithms, such as Chameleon, have an advantage over traditional algorithms when data exhibit discontinuities and variable structure, typicallyAbstract: Questions: Most clustering methods assume data are structured as discrete hyperspheroidal clusters to be evaluated by measures of central tendency. If vegetation data do not conform to this model, then vegetation data may be clustered incorrectly. What are the implications for cluster stability and evaluation if clusters are of irregular shape or density? Location: Southeast Australia. Methods: We define misplacement as the placement of a sample in a cluster other than (distinct from) its nearest neighbor and hypothesize that optimizing homogeneity incurs the cost of higher rates of misplacement. Chameleon is a graph‐theoretic algorithm that emphasizes interconnectivity and thus is sensitive to the shape and distribution of clusters. We contrasted its solutions with those of traditional nonhierarchical and hierarchical (agglomerative and divisive) approaches. Results: Chameleon‐derived solutions had lower rates of misplacement and only marginally higher heterogeneity than those of k‐means in the range of 15–60 clusters, but their metrics converged with larger numbers of clusters. Solutions derived by agglomerative clustering had the best metrics (and divisive clustering the worst) but both produced inferior high‐level solutions to those of Chameleon by merging distantly‐related clusters. Conclusions: Graph‐theoretic algorithms, such as Chameleon, have an advantage over traditional algorithms when data exhibit discontinuities and variable structure, typically producing more stable solutions (due to lower rates of misplacement) but scoring lower on traditional metrics of central tendency. Advantages are less obvious in the partitioning of data from continuous gradients; however, graph‐based partitioning protocols facilitate the hierarchical integration of solutions. Abstract : Our results suggest that Chameleon may have an advantage over traditional algorithms at thematic scales at which data exhibit discontinuities and variable structure, potentially producing more stable solutions (due to lower rates of misplacement), but scoring lower on traditional metrics of central‐tendency. Chameleon's advantages are less obvious in the partitioning of continuous data, however its graph‐based partitioning protocol facilitates hierarchical integration of solutions. … (more)
- Is Part Of:
- Ecology and evolution. Volume 12:Issue 11(2022)
- Journal:
- Ecology and evolution
- Issue:
- Volume 12:Issue 11(2022)
- Issue Display:
- Volume 12, Issue 11 (2022)
- Year:
- 2022
- Volume:
- 12
- Issue:
- 11
- Issue Sort Value:
- 2022-0012-0011-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2022-11-18
- Subjects:
- Chameleon -- classification -- cluster metrics -- cluster optimization -- clustering -- CLUTO -- homogeneity -- misplacement -- vegetation databases
Ecology -- Periodicals
Evolution -- Periodicals
577.05 - Journal URLs:
- http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)2045-7758 ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/ece3.9496 ↗
- Languages:
- English
- ISSNs:
- 2045-7758
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24416.xml