End-to-end neural systems for automatic children speech recognition: An empirical study. (March 2022)
- Record Type:
- Journal Article
- Title:
- End-to-end neural systems for automatic children speech recognition: An empirical study. (March 2022)
- Main Title:
- End-to-end neural systems for automatic children speech recognition: An empirical study
- Authors:
- Gurunath Shivakumar, Prashanth
Narayanan, Shrikanth - Abstract:
- Abstract: A key desiderata for inclusive and accessible speech recognition technology is ensuring its robust performance to children's speech. Notably, this includes the rapidly advancing neural network based end-to-end speech recognition systems. Children speech recognition is more challenging due to the larger intra-inter speaker variability in terms of acoustic and linguistic characteristics compared to adult speech. Furthermore, the lack of adequate and appropriate children speech resources adds to the challenge of designing robust end-to-end neural architectures. This study provides a critical assessment of automatic children speech recognition through an empirical study of contemporary state-of-the-art end-to-end speech recognition systems. Insights are provided on the aspects of training data requirements, adaptation on children data, and the effect of children age, utterance lengths, different architectures and loss functions for end-to-end systems and role of language models on the speech recognition performance. Highlights: Empirical study of end-to-end deep learning based children speech ASR Training/Adaptation comparisons for various adult and children speech dataset sizes Acoustic Model Comparison: DNN-HMM hybrid vs. End-to-End ASR DNN Architecture Comparison: ResNets vs. Time-depth separable CNN vs. Transformers Loss Function Comparison: Connectionist Temporal Classification vs. Seq-to-Seq Language Model Comparison: 4gram vs. 6gram vs. Gated-CNN LM; word vs.Abstract: A key desiderata for inclusive and accessible speech recognition technology is ensuring its robust performance to children's speech. Notably, this includes the rapidly advancing neural network based end-to-end speech recognition systems. Children speech recognition is more challenging due to the larger intra-inter speaker variability in terms of acoustic and linguistic characteristics compared to adult speech. Furthermore, the lack of adequate and appropriate children speech resources adds to the challenge of designing robust end-to-end neural architectures. This study provides a critical assessment of automatic children speech recognition through an empirical study of contemporary state-of-the-art end-to-end speech recognition systems. Insights are provided on the aspects of training data requirements, adaptation on children data, and the effect of children age, utterance lengths, different architectures and loss functions for end-to-end systems and role of language models on the speech recognition performance. Highlights: Empirical study of end-to-end deep learning based children speech ASR Training/Adaptation comparisons for various adult and children speech dataset sizes Acoustic Model Comparison: DNN-HMM hybrid vs. End-to-End ASR DNN Architecture Comparison: ResNets vs. Time-depth separable CNN vs. Transformers Loss Function Comparison: Connectionist Temporal Classification vs. Seq-to-Seq Language Model Comparison: 4gram vs. 6gram vs. Gated-CNN LM; word vs. word-piece LM Error Analysis: Letter Errors vs. Word Errors; Utterance Lengths; Confusion Analysis … (more)
- Is Part Of:
- Computer speech & language. Volume 72(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 72(2022)
- Issue Display:
- Volume 72, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 72
- Issue:
- 2022
- Issue Sort Value:
- 2022-0072-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-03
- Subjects:
- Children speech recognition -- End-to-end speech recognition -- Residual network -- Time depth separable convolutional network -- Transformer
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101289 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20051.xml