On the localness modeling for the self-attention based end-to-end speech synthesis. (May 2020)

Record Type:: Journal Article
Title:: On the localness modeling for the self-attention based end-to-end speech synthesis. (May 2020)
Main Title:: On the localness modeling for the self-attention based end-to-end speech synthesis
Authors:: Yang, Shan
Lu, Heng
Kang, Shiyin
Xue, Liumeng
Xiao, Jinba
Su, Dan
Xie, Lei
Yu, Dong
Abstract:: Abstract: Attention based end-to-end speech synthesis achieves better performance in both prosody and quality compared to the conventional "front-end"–"back-end" structure. But training such end-to-end framework is usually time-consuming because of the use of recurrent neural networks. To enable parallel calculation and long-range dependency modeling, a solely self-attention based framework named Transformer is proposed recently in the end-to-end family. However, it lacks position information in sequential modeling, so that the extra position representation is crucial to achieve good performance. Besides, the weighted sum form of self-attention is conducted over the whole input sequence when computing latent representation, which may disperse the attention to the whole input sequence other than focusing on the more important neighboring input states, resulting in generation errors. In this paper, we introduce two localness modeling methods to enhance the self-attention based representation for speech synthesis, which maintain the abilities of parallel computation and global-range dependency modeling in self-attention while improving the generation stability. We systematically analyze the solely self-attention based end-to-end speech synthesis framework, and unveil the importance of local context. Then we add the proposed relative-position-aware method to enhance local edges and experiment with different architectures to examine the effectiveness of localness modeling. In … (more)
Is Part Of:: Neural networks. Volume 125(2020)
Journal:: Neural networks
Issue:: Volume 125(2020)
Issue Display:: Volume 125, Issue 2020 (2020)
Year:: 2020
Volume:: 125
Issue:: 2020
Issue Sort Value:: 2020-0125-2020-0000
Page Start:: 121
Page End:: 130
Publication Date:: 2020-05
Subjects:: Speech synthesis -- Self attention -- Localness modeling -- Relative-position-aware -- Gaussian bias
Neural computers -- Periodicals
Neural networks (Computer science) -- Periodicals
Neural networks (Neurobiology) -- Periodicals
Nervous System -- Periodicals
Ordinateurs neuronaux -- Périodiques
Réseaux neuronaux (Informatique) -- Périodiques
Réseaux neuronaux (Neurobiologie) -- Périodiques
Neural computers
Neural networks (Computer science)
Neural networks (Neurobiology)
Periodicals
006.32
Journal URLs:: http://www.sciencedirect.com/science/journal/08936080 ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.neunet.2020.01.034 ↗
Languages:: English
ISSNs:: 0893-6080
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 6081.280800
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 13372.xml