Unsupervised random forests. (5th February 2021)
- Record Type:
- Journal Article
- Title:
- Unsupervised random forests. (5th February 2021)
- Main Title:
- Unsupervised random forests
- Authors:
- Mantero, Alejandro
Ishwaran, Hemant - Abstract:
- Abstract: sidClustering is a new random forests unsupervised machine learning algorithm. The first step in sidClustering involves what is called sidification of the features: staggering the features to have mutually exclusive ranges (called the staggered interaction data [SID] main features) and then forming all pairwise interactions (called the SID interaction features). Then a multivariate random forest (able to handle both continuous and categorical variables) is used to predict the SID main features. We establish uniqueness of sidification and show how multivariate impurity splitting is able to identify clusters. The proposed sidClustering method is adept at finding clusters arising from categorical and continuous variables and retains all the important advantages of random forests. The method is illustrated using simulated and real data as well as two in depth case studies, one from a large multi‐institutional study of esophageal cancer, and the other involving hospital charges for cardiovascular patients.
- Is Part Of:
- Statistical analysis and data mining. Volume 14:Number 2(2021)
- Journal:
- Statistical analysis and data mining
- Issue:
- Volume 14:Number 2(2021)
- Issue Display:
- Volume 14, Issue 2 (2021)
- Year:
- 2021
- Volume:
- 14
- Issue:
- 2
- Issue Sort Value:
- 2021-0014-0002-0000
- Page Start:
- 144
- Page End:
- 167
- Publication Date:
- 2021-02-05
- Subjects:
- impurity -- staggered interaction data -- sidClustering -- unsupervised learning
Data mining -- Statistical methods -- Periodicals
006.312 - Journal URLs:
- http://www3.interscience.wiley.com/journal/112701062/home ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/sam.11498 ↗
- Languages:
- English
- ISSNs:
- 1932-1864
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8447.424100
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16157.xml