Train from scratch: Single-stage joint training of speech separation and recognition. (November 2022)
- Record Type:
- Journal Article
- Title:
- Train from scratch: Single-stage joint training of speech separation and recognition. (November 2022)
- Main Title:
- Train from scratch: Single-stage joint training of speech separation and recognition
- Authors:
- Shi, Jing
Chang, Xuankai
Watanabe, Shinji
Xu, Bo - Abstract:
- Abstract: Multi-speaker speech separation and recognition gains much attention in the speech community recently. Previously, most studies train the front-end separation module and back-end recognition module individually. The two modules after training are combined together either with a hybrid structure or by fine-tuning the resulting model. In this work, we present a unified and flexible multi-speaker end-to-end ASR model. In contrast to previous studies, our proposed model is trained from scratch with a complete single stage, rather than multiple training stages based on pre-training and the following fine-tuning. Our model can deal with either single-channel or multi-channel speech input. Moreover, the proposed model can be trained with or without the clean source speech references. We evaluate the proposed model on the WSJ0-2mix dataset in both single-channel and spatialized multi-channel conditions. The experiments demonstrate that the proposed methods can improve the performance of the end-to-end model in recognizing the separated streams without much degradation in speech separation, achieving a new state-of-the-art in the WSJ0-2mix dataset. Moreover, we systematically assess the impact of various features for the success of the joint-training model and will release all our codes, which may provide a new guidance for the integration of front-end and back-end towards complex auditory scenes.
- Is Part Of:
- Computer speech & language. Volume 76(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 76(2022)
- Issue Display:
- Volume 76, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 76
- Issue:
- 2022
- Issue Sort Value:
- 2022-0076-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-11
- Subjects:
- Cocktail party problem -- Speech separation -- Multi-speaker speech recognition -- End-to-end -- Joint-training
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101387 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21757.xml