Cross-lingual multi-speaker speech synthesis with limited bilingual training data. (January 2023)

Record Type:: Journal Article
Title:: Cross-lingual multi-speaker speech synthesis with limited bilingual training data. (January 2023)
Main Title:: Cross-lingual multi-speaker speech synthesis with limited bilingual training data
Authors:: Cai, Zexin
Yang, Yaogen
Li, Ming
Abstract:: Abstract: Modeling voices for multiple speakers and multiple languages with one speech synthesis system has been a challenge for a long time, especially in low-resource cases. This paper presents two approaches to achieve cross-lingual multi-speaker text-to-speech (TTS) and code-switching synthesis under two training scenarios: (1) cross-lingual synthesis with sufficient data, (2) cross-lingual synthesis with limited data per speaker. Accordingly, a novel TTS synthesis model and a non-autoregressive multi-speaker voice conversion model are proposed. The TTS model designed for sufficient-data cases has a Tacotron-based structure that uses shared phonemic representations associated with numeric language ID codes. As for the data-limited scenario, we adopt a framework cascading several speech modules to achieve our goal. In particular, we proposed a non-autoregressive many-to-many voice conversion module to address multi-speaker synthesis for data-insufficient cases. Experimental results on speaker similarity show that our proposed voice conversion module can maintain the voice characteristics well in data-limited cases. Both approaches use limited bilingual data and demonstrate impressive performance in cross-lingual synthesis, which can deliver fluent foreign speech and even code-switching speech for monolingual speakers.
Is Part Of:: Computer speech & language. Volume 77(2023)
Journal:: Computer speech & language
Issue:: Volume 77(2023)
Issue Display:: Volume 77, Issue 2023 (2023)
Year:: 2023
Volume:: 77
Issue:: 2023
Issue Sort Value:: 2023-0077-2023-0000
Page Start:
Page End:
Publication Date:: 2023-01
Subjects:: Text-to-speech -- Multi-speaker speech synthesis -- Multilingual speech synthesis -- Cross-lingual speech synthesis -- Voice conversion -- Code-switching
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2022.101427 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 23321.xml