Unsupervised pattern recognition of mixed data structures with numerical and categorical features using a mixture regression modelling framework. (April 2019)
- Record Type:
- Journal Article
- Title:
- Unsupervised pattern recognition of mixed data structures with numerical and categorical features using a mixture regression modelling framework. (April 2019)
- Main Title:
- Unsupervised pattern recognition of mixed data structures with numerical and categorical features using a mixture regression modelling framework
- Authors:
- Ng, Shu-Kay
Tawiah, Richard
McLachlan, Geoffrey J. - Abstract:
- Highlights: Cluster analysis of mixed-feature data imposes challenges in mixture modelling. Comorbid-condition groups inform potential shared biologic processes among diseases. Individuals with heterogeneous comorbidity patterns show different risk features. Regression models improve clustering results by adjustment of relevant risk factors. This method is applicable for more general mixed data, via consensus clustering. Abstract: In the present era of "Big Data", data collection involving massive amount of features with a mix of variable types is commonplace. Mixture model-based techniques for statistical cluster analysis of mixed numerical and categorical feature data have their limitations, due to the difficulty in specifying appropriate component-densities when common multivariate distributions become invalid. This problem is particularly apparent in applications where the outcome feature variables are in a categorical form. An example of such an application is the analysis of binary morbidity data in national health survey, where the aims are to quantify heterogeneous comorbidity patterns of health conditions and identify (risk)-features of individuals that explain the heterogeneity. In this paper, we propose an unsupervised mixture regression model of multivariate generalised Bernoulli distributions for cluster analysis on the basis of categorical outcome features and mixed risk features. The proposed method is illustrated using simulated data and two real data setsHighlights: Cluster analysis of mixed-feature data imposes challenges in mixture modelling. Comorbid-condition groups inform potential shared biologic processes among diseases. Individuals with heterogeneous comorbidity patterns show different risk features. Regression models improve clustering results by adjustment of relevant risk factors. This method is applicable for more general mixed data, via consensus clustering. Abstract: In the present era of "Big Data", data collection involving massive amount of features with a mix of variable types is commonplace. Mixture model-based techniques for statistical cluster analysis of mixed numerical and categorical feature data have their limitations, due to the difficulty in specifying appropriate component-densities when common multivariate distributions become invalid. This problem is particularly apparent in applications where the outcome feature variables are in a categorical form. An example of such an application is the analysis of binary morbidity data in national health survey, where the aims are to quantify heterogeneous comorbidity patterns of health conditions and identify (risk)-features of individuals that explain the heterogeneity. In this paper, we propose an unsupervised mixture regression model of multivariate generalised Bernoulli distributions for cluster analysis on the basis of categorical outcome features and mixed risk features. The proposed method is illustrated using simulated data and two real data sets concerning comorbidity patterns among 20, 788 Australians who participated in the 2007–2008 National Health Survey (NHS) and among 470 patients who were recruited in a randomised controlled trial of a health intervention about in-patient detoxification from alcohol, heroin or cocaine in Boston. The method is also readily applicable to cluster more general mixed-feature data via the framework of consensus clustering. … (more)
- Is Part Of:
- Pattern recognition. Volume 88(2019:Apr.)
- Journal:
- Pattern recognition
- Issue:
- Volume 88(2019:Apr.)
- Issue Display:
- Volume 88 (2019)
- Year:
- 2019
- Volume:
- 88
- Issue Sort Value:
- 2019-0088-0000-0000
- Page Start:
- 261
- Page End:
- 271
- Publication Date:
- 2019-04
- Subjects:
- Mixture model -- Mixed feature -- Cluster analysis -- Comorbidity -- Generalised Bernoulli distribution
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2018.11.022 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 9397.xml