A Bayesian mixture model for clustering and selection of feature occurrence rates under mean constraints. (8th June 2017)
- Record Type:
- Journal Article
- Title:
- A Bayesian mixture model for clustering and selection of feature occurrence rates under mean constraints. (8th June 2017)
- Main Title:
- A Bayesian mixture model for clustering and selection of feature occurrence rates under mean constraints
- Authors:
- Li, Qiwei
Guindani, Michele
Reich, Brian J.
Bondell, Howard D.
Vannucci, Marina - Abstract:
- Abstract : In this paper, we consider the problem of modeling a matrix of count data, where multiple features are observed as counts over a number of samples. Due to the nature of the data generating mechanism, such data are often characterized by a high number of zeros and overdispersion. In order to take into account the skewness and heterogeneity of the data, some type of normalization and regularization is necessary for conducting inference on the occurrences of features across samples. We propose a zero‐inflated Poisson mixture modeling framework that incorporates a model‐based normalization through prior distributions with mean constraints, as well as a feature selection mechanism, which allows us to identify a parsimonious set of discriminatory features, and simultaneously cluster the samples into homogenous groups. We show how our approach improves on the accuracy of the clustering with respect to more standard approaches for the analysis of count data, by means of a simulation study and an application to a bag‐of‐words benchmark data set, where the features are represented by the frequencies of occurrence of each word.
- Is Part Of:
- Statistical analysis and data mining. Volume 10:Number 6(2017)
- Journal:
- Statistical analysis and data mining
- Issue:
- Volume 10:Number 6(2017)
- Issue Display:
- Volume 10, Issue 6 (2017)
- Year:
- 2017
- Volume:
- 10
- Issue:
- 6
- Issue Sort Value:
- 2017-0010-0006-0000
- Page Start:
- 393
- Page End:
- 409
- Publication Date:
- 2017-06-08
- Subjects:
- Bayesian nonparametrics -- count data -- feature selection -- Poisson mixture -- text analysis
Data mining -- Statistical methods -- Periodicals
006.312 - Journal URLs:
- http://www3.interscience.wiley.com/journal/112701062/home ↗
http://onlinelibrary.wiley.com/ ↗ - DOI:
- 10.1002/sam.11350 ↗
- Languages:
- English
- ISSNs:
- 1932-1864
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 8447.424100
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5367.xml