Multiple scaled contaminated normal distribution and its application in clustering. (August 2021)
- Record Type:
- Journal Article
- Title:
- Multiple scaled contaminated normal distribution and its application in clustering. (August 2021)
- Main Title:
- Multiple scaled contaminated normal distribution and its application in clustering
- Authors:
- Punzo, Antonio
Tortora, Cristina - Abstract:
- The multivariate contaminated normal (MCN) distribution represents a simple heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the presence of mild outliers (also referred to as 'bad' points herein) and automatically detect bad points. The price of these advantages is two additional parameters: proportion of good observations and degree of contamination. However, in a multivariate setting, only one proportion of good observations and only one degree of contamination may be limiting. To overcome this limitation, we propose a multiple scaled contaminated normal (MSCN) distribution. Among its parameters, we have an orthogonal matrix Γ. In the space spanned by the vectors (principal components) of Γ, there is a proportion of good observations and a degree of contamination for each component. Moreover, each observation has a posterior probability of being good with respect to each principal component. Thanks to this probability, the method provides directional robust estimates of the parameters of the nested MN and automatic directional detection of bad points. The term 'directional' is added to specify that the method works separately for each principal component. Mixtures of MSCN distributions are also proposed, and an expectation-maximization algorithm is used for parameter estimation. Real and simulated data are considered to show the usefulness of our mixture with respect to well-established mixtures ofThe multivariate contaminated normal (MCN) distribution represents a simple heavy-tailed generalization of the multivariate normal (MN) distribution to model elliptical contoured scatters in the presence of mild outliers (also referred to as 'bad' points herein) and automatically detect bad points. The price of these advantages is two additional parameters: proportion of good observations and degree of contamination. However, in a multivariate setting, only one proportion of good observations and only one degree of contamination may be limiting. To overcome this limitation, we propose a multiple scaled contaminated normal (MSCN) distribution. Among its parameters, we have an orthogonal matrix Γ. In the space spanned by the vectors (principal components) of Γ, there is a proportion of good observations and a degree of contamination for each component. Moreover, each observation has a posterior probability of being good with respect to each principal component. Thanks to this probability, the method provides directional robust estimates of the parameters of the nested MN and automatic directional detection of bad points. The term 'directional' is added to specify that the method works separately for each principal component. Mixtures of MSCN distributions are also proposed, and an expectation-maximization algorithm is used for parameter estimation. Real and simulated data are considered to show the usefulness of our mixture with respect to well-established mixtures of symmetric distributions with heavy tails. … (more)
- Is Part Of:
- Statistical modelling. Volume 21:Number 4(2021)
- Journal:
- Statistical modelling
- Issue:
- Volume 21:Number 4(2021)
- Issue Display:
- Volume 21, Issue 4 (2021)
- Year:
- 2021
- Volume:
- 21
- Issue:
- 4
- Issue Sort Value:
- 2021-0021-0004-0000
- Page Start:
- 332
- Page End:
- 358
- Publication Date:
- 2021-08
- Subjects:
- contaminated normal distribution -- heavy-tailed distributions -- multiple scaled distributions -- EM algorithm -- mixture models -- model-based clustering
Linear models (Statistics) -- Periodicals
Mathematical models -- Periodicals
Modèles linéaires (Statistique) -- Périodiques
Modèles mathématiques -- Périodiques
Modèle statistique
Modèle linéaire
Modélisation statistique
Périodique électronique (Descripteur de forme)
Ressource Internet (Descripteur de forme)
519.5011 - Journal URLs:
- http://www.uk.sagepub.com/home.nav ↗
http://firstsearch.oclc.org ↗
http://firstsearch.oclc.org/journal=1471-082x;screen=info;ECOIP ↗ - DOI:
- 10.1177/1471082X19890935 ↗
- Languages:
- English
- ISSNs:
- 1471-082X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16184.xml