Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection. (November 2019)

Record Type:: Journal Article
Title:: Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection. (November 2019)
Main Title:: Vocal Tract Length Normalization using a Gaussian mixture model framework for query-by-example spoken term detection
Authors:: Madhavi, Maulik C.
Patil, Hemant A.
Abstract:: Abstract: A speech spectrum is known to be changed by the variations in the length of the vocal tract of a speaker. This is because of the fact that speech formants are inversely related to the vocal tract length (VTL). The process of compensating spectral variation due to the length of the vocal tract is known as Vocal Tract Length Normalization (VTLN). VTLN is a very important speaker normalization technique for speech recognition and related tasks. In this paper, we used Gaussian Posteriorgram (GP) of VTL-warped spectral features for a Query-by-Example Spoken Term Detection (QbE-STD) task. This paper presents the use of a Gaussian Mixture Model (GMM) framework for VTLN warping factor estimation. In particular, the presented GMM framework does not require phoneme-level transcription. We observed the correlation between the VTLN warping factor estimates obtained via a supervised HMM-based approach and an unsupervised GMM-based approach. In addition, a phoneme recognition and speaker de-identification tasks were conducted using GMM-based VTLN warping factor estimates. For QbE-STD, we considered three spectral features, namely, Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), and MFCC-TMP (which uses Teager Energy Operator (TEO) to exploit implicitly magnitude and phase information in the MFCC framework). Linear frequency scaling variations for VTLN warping factor are incorporated into these three cepstral representations for the QbE-STD task. … (more)
Is Part Of:: Computer speech & language. Volume 58(2019)
Journal:: Computer speech & language
Issue:: Volume 58(2019)
Issue Display:: Volume 58, Issue 2019 (2019)
Year:: 2019
Volume:: 58
Issue:: 2019
Issue Sort Value:: 2019-0058-2019-0000
Page Start:: 175
Page End:: 202
Publication Date:: 2019-11
Subjects:: Vocal Tract Length Normalization -- Query-by-example spoken term detection -- Spoken web search task -- Gaussian posteriorgrams -- Dynamic time warping
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2019.03.005 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 11148.xml