Design of mixture of GMMs for Query-by-Example Spoken Term Detection. (November 2018)
- Record Type:
- Journal Article
- Title:
- Design of mixture of GMMs for Query-by-Example Spoken Term Detection. (November 2018)
- Main Title:
- Design of mixture of GMMs for Query-by-Example Spoken Term Detection
- Authors:
- Madhavi, Maulik C.
Patil, Hemant A. - Abstract:
- Highlights: Mixture of GMMs is presented, where the prior probabilities of mixtures are set by broad phoneme posterior probabilities. Posteriorgram of a mixture of GMMs is computed with different cepstral representations. Posteriorgram of a mixture of GMMs is effective under different evaluation factors for TIMIT and SWS 2013 corpora. Broad phoneme posterior probabilities obtained with limited labeled data is also exploited for training. Abstract: This paper presents the design of a mixture of Gaussian Mixture Models (GMMs) for Query-by-Example Spoken Term Detection (QbE-STD). The speech data governs acoustically similar broad phonetic structures. To capture broad phonetic structure, we exploit additional information of broad phoneme classes (such as vowels, semi-vowels, nasals, fricatives, and plosives) for the training of the GMM. The mixture of GMMs is tied with GMMs of these broad phoneme classes, i.e., each GMM expresses the probability density function ( pdf ) of a broad phoneme category. The Expectation Maximization (EM) algorithm is used to obtain the GMM for each broad phoneme class. Thus, a mixture of GMMs represents the spoken query with the broad phonetic constraints. These constraints restrict the posterior probability within the broad class, which results into a better posteriorgram design. The novelty of our work lies in prior probability assignments (as weights of the mixture of GMMs) for better Gaussian posteriorgram design. The proposed simple yet effectiveHighlights: Mixture of GMMs is presented, where the prior probabilities of mixtures are set by broad phoneme posterior probabilities. Posteriorgram of a mixture of GMMs is computed with different cepstral representations. Posteriorgram of a mixture of GMMs is effective under different evaluation factors for TIMIT and SWS 2013 corpora. Broad phoneme posterior probabilities obtained with limited labeled data is also exploited for training. Abstract: This paper presents the design of a mixture of Gaussian Mixture Models (GMMs) for Query-by-Example Spoken Term Detection (QbE-STD). The speech data governs acoustically similar broad phonetic structures. To capture broad phonetic structure, we exploit additional information of broad phoneme classes (such as vowels, semi-vowels, nasals, fricatives, and plosives) for the training of the GMM. The mixture of GMMs is tied with GMMs of these broad phoneme classes, i.e., each GMM expresses the probability density function ( pdf ) of a broad phoneme category. The Expectation Maximization (EM) algorithm is used to obtain the GMM for each broad phoneme class. Thus, a mixture of GMMs represents the spoken query with the broad phonetic constraints. These constraints restrict the posterior probability within the broad class, which results into a better posteriorgram design. The novelty of our work lies in prior probability assignments (as weights of the mixture of GMMs) for better Gaussian posteriorgram design. The proposed simple yet effective posteriorgram outperform Gaussian posteriorgram because of its implicit constraints supplied by broad phonetic posteriors. The Maximum Term Weighted Value (MTWV) for SWS 2013 dataset is improved by 0.052, and 0.051 w.r.t. Gaussian posteriorgram for Mel Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Prediction (PLP), respectively. We found that the proposed mixture of GMMs approach gave consistently better performance than the Gaussian posteriorgram across various evaluation factors, such as different cepstral representations, number of Gaussian components, the number of spoken examples per query, and effect of amount of labeled data used for broad phoneme posterior computation. … (more)
- Is Part Of:
- Computer speech & language. Volume 52(2018)
- Journal:
- Computer speech & language
- Issue:
- Volume 52(2018)
- Issue Display:
- Volume 52, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 52
- Issue:
- 2018
- Issue Sort Value:
- 2018-0052-2018-0000
- Page Start:
- 41
- Page End:
- 55
- Publication Date:
- 2018-11
- Subjects:
- Query-by-Example Spoken Term Detection -- Phone posteriorgram -- Mixture of GMMs -- Gaussian posteriorgram
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2018.04.006 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 17055.xml