Assessment of pitch-adaptive front-end signal processing for children's speech recognition. (March 2018)
- Record Type:
- Journal Article
- Title:
- Assessment of pitch-adaptive front-end signal processing for children's speech recognition. (March 2018)
- Main Title:
- Assessment of pitch-adaptive front-end signal processing for children's speech recognition
- Authors:
- Sinha, Rohit
Shahnawazuddin, S. - Abstract:
- Highlights: Studying the need for pitch normalization during the front-end speech parameterization step in the case of children's speech recognition system. Analyzing the reasons behind the pitch sensitivity of MFCC features. Exploring the effectiveness of STRAIGHT-based MFCCs in the context of children's ASR. A novel approach based on adaptive-liftering to smoothen out the pitchinduced distortions in the magnitude spectra of the speech signal. Exploring the effectiveness of the explored pitch-adaptive approaches for improving the recognition of children's speech under acoustically mismatched condition on a DNN-based ASR system. Abstract: On account of large acoustic mismatch, automatic speech recognition (ASR) systems trained using adults' speech data yield poor recognition performance when evaluated on children's speech data. Despite the use of common speaker normalization techniques like feature-space maximum likelihood regression (fMLLR) and vocal tract length normalization (VTLN), a significant gap remains between the recognition rates for matched and mismatched testing. Our earlier works have already highlighted the sensitivity of salient front-end features including the popular Mel-frequency cepstral coefficient (MFCC) to gross pitch variation across adult and child speakers. Motivated by that, in this work, we explore pitch-adaptive front-end signal processing in deriving the MFCC features to reduce the sensitivity to pitch variation. For this purpose, first anHighlights: Studying the need for pitch normalization during the front-end speech parameterization step in the case of children's speech recognition system. Analyzing the reasons behind the pitch sensitivity of MFCC features. Exploring the effectiveness of STRAIGHT-based MFCCs in the context of children's ASR. A novel approach based on adaptive-liftering to smoothen out the pitchinduced distortions in the magnitude spectra of the speech signal. Exploring the effectiveness of the explored pitch-adaptive approaches for improving the recognition of children's speech under acoustically mismatched condition on a DNN-based ASR system. Abstract: On account of large acoustic mismatch, automatic speech recognition (ASR) systems trained using adults' speech data yield poor recognition performance when evaluated on children's speech data. Despite the use of common speaker normalization techniques like feature-space maximum likelihood regression (fMLLR) and vocal tract length normalization (VTLN), a significant gap remains between the recognition rates for matched and mismatched testing. Our earlier works have already highlighted the sensitivity of salient front-end features including the popular Mel-frequency cepstral coefficient (MFCC) to gross pitch variation across adult and child speakers. Motivated by that, in this work, we explore pitch-adaptive front-end signal processing in deriving the MFCC features to reduce the sensitivity to pitch variation. For this purpose, first an existing vocoder approach known as STRAIGHT spectral analysis is employed for obtaining the smoothed spectrum devoid of pitch harmonics. Secondly, a much simpler spectrum smoothing approach exploiting pitch adaptive-liferting is also presented. The proposed approach is noted to be less sensitive to errors in the pitch estimation than the STRAIGHT-based approach. Both these approaches result in significant improvements for children's mismatch ASR. The effectiveness of the proposed adaptive-liftering-based approach is also demonstrated in the context of acoustic modeling paradigms based on the subspace Gaussian mixture model (SGMM) and the deep neural network (DNN). Further, it has been shown that the effectiveness of existing speaker normalization techniques remain intact even with the use of proposed pitch-adaptive MFCCs, thus leading to additional gains. … (more)
- Is Part Of:
- Computer speech & language. Volume 48(2018)
- Journal:
- Computer speech & language
- Issue:
- Volume 48(2018)
- Issue Display:
- Volume 48, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 48
- Issue:
- 2018
- Issue Sort Value:
- 2018-0048-2018-0000
- Page Start:
- 103
- Page End:
- 121
- Publication Date:
- 2018-03
- Subjects:
- Children's ASR -- Acoustic mismatch -- Pitch-adaptive features -- STRAIGHT MFCC -- SGMM -- DNN
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.10.007 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5454.xml