Advances in subword-based HMM-DNN speech recognition across languages. (March 2021)
- Record Type:
- Journal Article
- Title:
- Advances in subword-based HMM-DNN speech recognition across languages. (March 2021)
- Main Title:
- Advances in subword-based HMM-DNN speech recognition across languages
- Authors:
- Smit, Peter
Virpioja, Sami
Kurimo, Mikko - Abstract:
- Highlights: Subword modeling is shown to fit for Finnish, Swedish, Arabic to English. Proper subword modeling outperforms word based modeling for these languages. Subword models can be combined to improve speech recognition systems further. The best published results on 4 datasets using Hybrid HMM-DNN speech recognition. Abstract: We describe a novel way to implement subword language models in speech recognition systems based on weighted finite state transducers, hidden Markov models, and deep neural networks. The acoustic models are built on graphemes in a way that no pronunciation dictionaries are needed, and they can be used together with any type of subword language model, including character models. The advantages of short subword units are good lexical coverage, reduced data sparsity, and avoiding vocabulary mismatches in adaptation. Moreover, constructing neural network language models (NNLMs) is more practical, because the input and output layers are small. We also propose methods for combining the benefits of different types of language model units by reconstructing and combining the recognition lattices. We present an extensive evaluation of various subword units on speech datasets of four languages: Finnish, Swedish, Arabic, and English. The results show that the benefits of short subwords are even more consistent with NNLMs than with traditional n-gram language models. Combination across different acoustic models and language models with various units improve theHighlights: Subword modeling is shown to fit for Finnish, Swedish, Arabic to English. Proper subword modeling outperforms word based modeling for these languages. Subword models can be combined to improve speech recognition systems further. The best published results on 4 datasets using Hybrid HMM-DNN speech recognition. Abstract: We describe a novel way to implement subword language models in speech recognition systems based on weighted finite state transducers, hidden Markov models, and deep neural networks. The acoustic models are built on graphemes in a way that no pronunciation dictionaries are needed, and they can be used together with any type of subword language model, including character models. The advantages of short subword units are good lexical coverage, reduced data sparsity, and avoiding vocabulary mismatches in adaptation. Moreover, constructing neural network language models (NNLMs) is more practical, because the input and output layers are small. We also propose methods for combining the benefits of different types of language model units by reconstructing and combining the recognition lattices. We present an extensive evaluation of various subword units on speech datasets of four languages: Finnish, Swedish, Arabic, and English. The results show that the benefits of short subwords are even more consistent with NNLMs than with traditional n-gram language models. Combination across different acoustic models and language models with various units improve the results further. For all the four datasets we obtain the best results published so far. Our approach performs well even for English, where the phoneme-based acoustic models and word-based language models typically dominate: The phoneme-based baseline performance can be reached and improved by 4% using graphemes only when several grapheme-based models are combined. Furthermore, combining both grapheme and phoneme models yields the state-of-the-art error rate of 15.9% for the MGB 2018 dev17b test. For all four languages we also show that the language models perform reasonably well when only limited training data is available. … (more)
- Is Part Of:
- Computer speech & language. Volume 66(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 66(2021)
- Issue Display:
- Volume 66, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 66
- Issue:
- 2021
- Issue Sort Value:
- 2021-0066-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-03
- Subjects:
- Large vocabulary speech recognition -- Subword units -- Character units -- Recurrent neural network language models
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101158 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15413.xml