An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer. (May 2022)

Record Type:: Journal Article
Title:: An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer. (May 2022)
Main Title:: An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer
Authors:: Shi, Jiatong
Zhang, Chunlei
Weng, Chao
Watanabe, Shinji
Yu, Meng
Yu, Dong
Abstract:: Abstract: Target-speaker speech recognition aims to recognize the speech of an enrolled speaker from an environment with background noise and interfering speakers. This study presents a joint framework that combines time-domain target speaker extraction and recurrent neural network transducer (RNN-T) for speech recognition. To alleviate the adverse effects of residual noise and artifacts introduced by the target speaker extraction module to the speech recognition back-end, we explore to training the target speaker extraction and RNN-T jointly. We find a multi-stage training strategy that pre-trains and fine-tunes each module before joint training is crucial in stabilizing the training process. In addition, we propose a novel neural uncertainty estimation that leverages useful information from the target speaker extraction module to further improve the back-end speech recognizer (i.e., speaker identity uncertainty and speech enhancement uncertainty). Compared to a recognizer with target speech extraction front-end, our experiments show that joint-training and the neural uncertainty module reduce 7% and 17% relative character error rate (CER) on multi-talker simulation data, respectively. The multi-condition experiments indicate that our method can reduce 9% relative CER in the noisy condition without losing performance in the clean condition. We also observe consistent improvements in further evaluation of real-world data based on vehicular speech. Highlights: A framework … (more)
Is Part Of:: Computer speech & language. Volume 73(2022)
Journal:: Computer speech & language
Issue:: Volume 73(2022)
Issue Display:: Volume 73, Issue 2022 (2022)
Year:: 2022
Volume:: 73
Issue:: 2022
Issue Sort Value:: 2022-0073-2022-0000
Page Start:
Page End:
Publication Date:: 2022-05
Subjects:: Target-speaker speech recognition -- Target-speaker speech extraction -- Uncertainty estimation
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2021.101327 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 20459.xml