Testing the rogue taxa hypothesis for clustering instability. (7th July 2019)
- Record Type:
- Journal Article
- Title:
- Testing the rogue taxa hypothesis for clustering instability. (7th July 2019)
- Main Title:
- Testing the rogue taxa hypothesis for clustering instability
- Authors:
- Saunders, Amanda M.
Ashlock, Daniel
Graether, Steffen P. - Abstract:
- Abstract : Higlights: Instability in hierarchical trees measured using a novel tree distance. Low tree consensus due to flaws in tree building algorithm and not rogue taxa. Standard neighbor joining algorithm stability depends on the sample subset used. Our novel bubble clustering method creates more stable hierarchical trees. Abstract: There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of "rogue taxa", i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditionalAbstract : Higlights: Instability in hierarchical trees measured using a novel tree distance. Low tree consensus due to flaws in tree building algorithm and not rogue taxa. Standard neighbor joining algorithm stability depends on the sample subset used. Our novel bubble clustering method creates more stable hierarchical trees. Abstract: There have been longstanding concerns about the stability of hierarchical clustering. A suggested explanation for this instability is the presence of "rogue taxa", i.e. taxa whose removal from a data set can apparently restore stability. In this study, the rogue taxa hypothesis is tested by partitioning a large data set into many smaller ones and checking for rogue behavior. The checking was performed with a standard hierarchical clustering algorithm and with a novel algorithm designed to have greater stability. It was found that rogue taxa cannot reasonably be said to exist because the status of being a rogue taxon depends on the data partition in which the taxon is embedded. In addition to the choice of data used, the choice of algorithm and algorithm parameters can have a large effect on the degree to which a taxon appears rogue. Instability in hierarchical clustering can be increased by problematic data points, but the status of data points being problematic depends not on their biological antecedents, but on their position in the local geometry of the data. The results of this study strongly suggest that instability in traditional hierarchical clustering routines is primarily a problem with the algorithm design. … (more)
- Is Part Of:
- Journal of theoretical biology. Volume 472(2019)
- Journal:
- Journal of theoretical biology
- Issue:
- Volume 472(2019)
- Issue Display:
- Volume 472, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 472
- Issue:
- 2019
- Issue Sort Value:
- 2019-0472-2019-0000
- Page Start:
- 36
- Page End:
- 45
- Publication Date:
- 2019-07-07
- Subjects:
- Phylogenetics -- Hierarchical clustering -- Bioinformatics -- Bootstraping -- Clustering stability
Biology -- Periodicals
Biological Science Disciplines -- Periodicals
Biology -- Periodicals
Biologie -- Périodiques
Theoretische biologie
Biology
Periodicals
571.05 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00225193/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.jtbi.2019.04.002 ↗
- Languages:
- English
- ISSNs:
- 0022-5193
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 5069.075000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10095.xml