Building DNN acoustic models for large vocabulary speech recognition. (January 2017)
- Record Type:
- Journal Article
- Title:
- Building DNN acoustic models for large vocabulary speech recognition. (January 2017)
- Main Title:
- Building DNN acoustic models for large vocabulary speech recognition
- Authors:
- Maas, Andrew L.
Qi, Peng
Xie, Ziang
Hannun, Awni Y.
Lengerich, Christopher T.
Jurafsky, Daniel
Ng, Andrew Y. - Abstract:
- Highlights: Empirical investigation of neural network acoustic models for speech recognition. Evaluation of DNNs up to ten times larger than those used in previous works. Comparison of densely connected, convolutional, and locally-connected untied neural networks. Results on Switchboard and a combined 2100 hour corpus. Explanation of combined corpus baseline system training recipe which is now a part of the Kaldi speech toolkit. Abstract: Understanding architectural choices for deep neural networks (DNNs) is crucial to improving state-of-the-art speech recognition systems. We investigate which aspects of DNN acoustic model design are most important for speech recognition system performance, focusing on feed-forward networks. We study the effects of parameters like model size (number of layers, total parameters), architecture (convolutional networks), and training details (loss function, regularization methods) on DNN classifier performance and speech recognizer word error rates. On the Switchboard benchmark corpus we compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic modeling. Using a much larger 2100-hour training corpus (combining Switchboard and Fisher) we examine the performance of very large DNN models – with up to ten times more parameters than those typically used in speech recognition systems. The results suggest that a relatively simple DNN architecture and optimizationHighlights: Empirical investigation of neural network acoustic models for speech recognition. Evaluation of DNNs up to ten times larger than those used in previous works. Comparison of densely connected, convolutional, and locally-connected untied neural networks. Results on Switchboard and a combined 2100 hour corpus. Explanation of combined corpus baseline system training recipe which is now a part of the Kaldi speech toolkit. Abstract: Understanding architectural choices for deep neural networks (DNNs) is crucial to improving state-of-the-art speech recognition systems. We investigate which aspects of DNN acoustic model design are most important for speech recognition system performance, focusing on feed-forward networks. We study the effects of parameters like model size (number of layers, total parameters), architecture (convolutional networks), and training details (loss function, regularization methods) on DNN classifier performance and speech recognizer word error rates. On the Switchboard benchmark corpus we compare standard DNNs to convolutional networks, and present the first experiments using locally-connected, untied neural networks for acoustic modeling. Using a much larger 2100-hour training corpus (combining Switchboard and Fisher) we examine the performance of very large DNN models – with up to ten times more parameters than those typically used in speech recognition systems. The results suggest that a relatively simple DNN architecture and optimization technique give strong performance, and we offer intuitions about architectural choices like network depth over breadth. Our findings extend previous works to help establish a set of best practices for building DNN hybrid speech recognition systems and constitute an important first step toward analyzing more complex recurrent, sequence-discriminative, and HMM-free architectures. … (more)
- Is Part Of:
- Computer speech & language. Volume 41(2016)
- Journal:
- Computer speech & language
- Issue:
- Volume 41(2016)
- Issue Display:
- Volume 41, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 41
- Issue:
- 2016
- Issue Sort Value:
- 2016-0041-2016-0000
- Page Start:
- 195
- Page End:
- 213
- Publication Date:
- 2017-01
- Subjects:
- Hidden Markov model deep neural network (HMM-DNN) -- Neural networks -- Acoustic modeling -- Speech recognition -- Large vocabulary continuous speech recognition (LVCSR)
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2016.06.007 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2481.xml