Hybrid-task learning for robust automatic speech recognition. (November 2020)
- Record Type:
- Journal Article
- Title:
- Hybrid-task learning for robust automatic speech recognition. (November 2020)
- Main Title:
- Hybrid-task learning for robust automatic speech recognition
- Authors:
- Pironkov, Gueorgui
Wood, Sean UN
Dupont, Stéphane - Abstract:
- Abstract: In order to properly train an automatic speech recognition system, speech with its annotated transcriptions is most often required. The amount of real annotated data recorded in noisy and reverberant conditions is extremely limited, especially compared to the amount of data than can be simulated by adding noise to clean annotated speech. Thus, using both real and simulated data is important in order to improve robust speech recognition, as this increases the amount and diversity of training data (thanks to the simulated data) while also benefiting from a reduced mismatch between training and operation of the system (thanks to the real data). Another promising method applied to speech recognition in noisy and reverberant conditions is multi-task learning. The idea is to train one acoustic model to solve simultaneously at least two tasks that are different but related, with speech recognition being the main task. A successful auxiliary task consists of generating clean speech features using a regression loss (as a denoising auto-encoder). This auxiliary task though uses as targets clean speech, which implies that real data cannot be used. In order to tackle this problem a Hybrid-Task Learning system is proposed. This system switches frequently between multi and single-task learning depending on whether the input is real or simulated data respectively. Having a hybrid architecture allows us to benefit from both real and simulated data while using a denoisingAbstract: In order to properly train an automatic speech recognition system, speech with its annotated transcriptions is most often required. The amount of real annotated data recorded in noisy and reverberant conditions is extremely limited, especially compared to the amount of data than can be simulated by adding noise to clean annotated speech. Thus, using both real and simulated data is important in order to improve robust speech recognition, as this increases the amount and diversity of training data (thanks to the simulated data) while also benefiting from a reduced mismatch between training and operation of the system (thanks to the real data). Another promising method applied to speech recognition in noisy and reverberant conditions is multi-task learning. The idea is to train one acoustic model to solve simultaneously at least two tasks that are different but related, with speech recognition being the main task. A successful auxiliary task consists of generating clean speech features using a regression loss (as a denoising auto-encoder). This auxiliary task though uses as targets clean speech, which implies that real data cannot be used. In order to tackle this problem a Hybrid-Task Learning system is proposed. This system switches frequently between multi and single-task learning depending on whether the input is real or simulated data respectively. Having a hybrid architecture allows us to benefit from both real and simulated data while using a denoising auto-encoder as auxiliary task of a multi-task setup. We show that the relative improvement brought by the proposed hybrid-task learning architecture can reach up to 4.4% compared to the traditional single-task learning approach on the CHiME4 database. We also demonstrate the benefits of the hybrid approach compared to multi-task learning or adaptation. … (more)
- Is Part Of:
- Computer speech & language. Volume 64(2020)
- Journal:
- Computer speech & language
- Issue:
- Volume 64(2020)
- Issue Display:
- Volume 64, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 64
- Issue:
- 2020
- Issue Sort Value:
- 2020-0064-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-11
- Subjects:
- Multi-task learning -- Robust speech recognition -- Hybrid-task learning -- Denoising auto-Encoder -- Real & simulated data training -- Noise & reverberation -- CHiME4
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101103 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13431.xml