Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend. (November 2017)

Record Type:: Journal Article
Title:: Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend. (November 2017)
Main Title:: Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend
Authors:: Hori, Takaaki
Chen, Zhuo
Erdogan, Hakan
Hershey, John R.
Le Roux, Jonathan
Mitra, Vikramjit
Watanabe, Shinji
Abstract:: Highlights: An in-depth presentation of our multi-microphone ASR system for the CHiME-3 challenge. A new architecture with different beamforming and robust feature extraction methods. Pervasive use of DNNs and RNNs for speech enhancement and acoustic/language models. Achieved 5.05% WER for noisy speech data in highly challenging real environment. Abstract: This paper gives an in-depth presentation of the multi-microphone speech recognition system we submitted to the 3rd CHiME speech separation and recognition challenge (CHiME-3) and its extension. The proposed system takes advantage of recurrent neural networks (RNNs) throughout the model from the front-end speech enhancement to the language modeling. Three different types of beamforming are used to combine multi-microphone signals to obtain a single higher-quality signal. The beamformed signal is further processed by a single-channel long short-term memory (LSTM) enhancement network, which is used to extract stacked mel-frequency cepstral coefficients (MFCC) features. In addition, the beamformed signal is processed by two proposed noise-robust feature extraction methods. All features are used for decoding in speech recognition systems with deep neural network (DNN) based acoustic models and large-scale RNN language models to achieve high recognition accuracy in noisy environments. Our training methodology includes multi-channel noisy data training and speaker adaptive training, whereas at test time model combination is used … (more)
Is Part Of:: Computer speech & language. Volume 46(2017)
Journal:: Computer speech & language
Issue:: Volume 46(2017)
Issue Display:: Volume 46, Issue 2017 (2017)
Year:: 2017
Volume:: 46
Issue:: 2017
Issue Sort Value:: 2017-0046-2017-0000
Page Start:: 401
Page End:: 418
Publication Date:: 2017-11
Subjects:: CHiME-3 -- Robust speech recognition -- Beamforming -- Noise robust feature -- System combination,
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2017.01.013 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 4753.xml