Environmentally robust ASR front-end for deep neural network acoustic models. (May 2015)
- Record Type:
- Journal Article
- Title:
- Environmentally robust ASR front-end for deep neural network acoustic models. (May 2015)
- Main Title:
- Environmentally robust ASR front-end for deep neural network acoustic models
- Authors:
- Yoshioka, T.
Gales, M.J.F. - Abstract:
- Abstract : Highlights: Effects of various front-end schemes are examined using DNN acoustic models. Meeting transcription experiments are conducted using a single distant microphone. Both speaker independent/adaptive configurations are considered. A pipeline is proposed to integrate different classes of front-end schemes. The pipeline is used to analyse the way in which different schemes interact. Abstract: This paper examines the individual and combined impacts of various front-end approaches on the performance of deep neural network (DNN) based speech recognition systems in distant talking situations, where acoustic environmental distortion degrades the recognition performance. Training of a DNN-based acoustic model consists of generation of state alignments followed by learning the network parameters. This paper first shows that the network parameters are more sensitive to the speech quality than the alignments and thus this stage requires improvement. Then, various front-end robustness approaches to addressing this problem are categorised based on functionality. The degree to which each class of approaches impacts the performance of DNN-based acoustic models is examined experimentally. Based on the results, a front-end processing pipeline is proposed for efficiently combining different classes of approaches. Using this front-end, the combined effects of different classes of approaches are further evaluated in a single distant microphone-based meeting transcription taskAbstract : Highlights: Effects of various front-end schemes are examined using DNN acoustic models. Meeting transcription experiments are conducted using a single distant microphone. Both speaker independent/adaptive configurations are considered. A pipeline is proposed to integrate different classes of front-end schemes. The pipeline is used to analyse the way in which different schemes interact. Abstract: This paper examines the individual and combined impacts of various front-end approaches on the performance of deep neural network (DNN) based speech recognition systems in distant talking situations, where acoustic environmental distortion degrades the recognition performance. Training of a DNN-based acoustic model consists of generation of state alignments followed by learning the network parameters. This paper first shows that the network parameters are more sensitive to the speech quality than the alignments and thus this stage requires improvement. Then, various front-end robustness approaches to addressing this problem are categorised based on functionality. The degree to which each class of approaches impacts the performance of DNN-based acoustic models is examined experimentally. Based on the results, a front-end processing pipeline is proposed for efficiently combining different classes of approaches. Using this front-end, the combined effects of different classes of approaches are further evaluated in a single distant microphone-based meeting transcription task with both speaker independent (SI) and speaker adaptive training (SAT) set-ups. By combining multiple speech enhancement results, multiple types of features, and feature transformation, the front-end shows relative performance gains of 7.24% and 9.83% in the SI and SAT scenarios, respectively, over competitive DNN-based systems using log mel-filter bank features. … (more)
- Is Part Of:
- Computer speech & language. Volume 31(2015)
- Journal:
- Computer speech & language
- Issue:
- Volume 31(2015)
- Issue Display:
- Volume 31, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 31
- Issue:
- 2015
- Issue Sort Value:
- 2015-0031-2015-0000
- Page Start:
- 65
- Page End:
- 86
- Publication Date:
- 2015-05
- Subjects:
- Environmental robustness -- Deep neural network -- Front-end -- Meeting transcription
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2014.11.008 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5432.xml