A Novel Scaled dirichlet-based statistical framework for count data modeling: Unsupervised learning and exponential approximation. (November 2019)
- Record Type:
- Journal Article
- Title:
- A Novel Scaled dirichlet-based statistical framework for count data modeling: Unsupervised learning and exponential approximation. (November 2019)
- Main Title:
- A Novel Scaled dirichlet-based statistical framework for count data modeling: Unsupervised learning and exponential approximation
- Authors:
- Zamzami, Nuha
Bouguila, Nizar - Abstract:
- Highlights: We propose a novel model called the Multinomial Scaled Dirichlet (MSD) for modeling count data. We derive a new family of distributions that are approximations to MSD distributions to handle high-dimensional and sparse data that we call (EMSD). We develop a minimum message length (MML) criterion for determining of the number of components in EMSD mixture model. We evaluate the performance of both approaches through a set of extensive empirical experiments on challenging real-world applications. Results revealed that both MSD and EMSD capture the burstiness phenomenon successfully and correctly, and EMSD is many times faster than MSD. Abstract: The multinomial distribution and the Dirichlet Compound Multinomial (DCM) are widely accepted to model count data. However, recent research showed that the Dirichlet is not the best choice as a prior to multinomial. We propose a novel model called the Multinomial Scaled Dirichlet (MSD) distribution that is the composition of the scaled Dirichlet distribution and the multinomial. Moreover, to improve the computation efficiency in high-dimensional spaces, we propose to approximate the MSD as a member of the exponential family. The performance evaluation of the proposed models is conducted through a set of extensive empirical experiments on challenging applications, namely, text classification, facial expression recognition, and texture images clustering. The results show that the proposed model, and its approximation, striveHighlights: We propose a novel model called the Multinomial Scaled Dirichlet (MSD) for modeling count data. We derive a new family of distributions that are approximations to MSD distributions to handle high-dimensional and sparse data that we call (EMSD). We develop a minimum message length (MML) criterion for determining of the number of components in EMSD mixture model. We evaluate the performance of both approaches through a set of extensive empirical experiments on challenging real-world applications. Results revealed that both MSD and EMSD capture the burstiness phenomenon successfully and correctly, and EMSD is many times faster than MSD. Abstract: The multinomial distribution and the Dirichlet Compound Multinomial (DCM) are widely accepted to model count data. However, recent research showed that the Dirichlet is not the best choice as a prior to multinomial. We propose a novel model called the Multinomial Scaled Dirichlet (MSD) distribution that is the composition of the scaled Dirichlet distribution and the multinomial. Moreover, to improve the computation efficiency in high-dimensional spaces, we propose to approximate the MSD as a member of the exponential family. The performance evaluation of the proposed models is conducted through a set of extensive empirical experiments on challenging applications, namely, text classification, facial expression recognition, and texture images clustering. The results show that the proposed model, and its approximation, strive to achieve higher accuracy compared to the state-of-the-art generative models for count data clustering, while the approximation EMSD is many times faster than the corresponding MSD. … (more)
- Is Part Of:
- Pattern recognition. Volume 95(2019:Nov.)
- Journal:
- Pattern recognition
- Issue:
- Volume 95(2019:Nov.)
- Issue Display:
- Volume 95 (2019)
- Year:
- 2019
- Volume:
- 95
- Issue Sort Value:
- 2019-0095-0000-0000
- Page Start:
- 36
- Page End:
- 47
- Publication Date:
- 2019-11
- Subjects:
- Count data -- Burstiness -- DAEM -- Multinomial -- Scaled dirichlet -- Finite mixture models -- Exponential family approximation -- Model selection -- Text collection -- Image databases
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2019.05.038 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11157.xml