Synthesizing Talking Faces from Text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network. (June 2020)

Record Type:: Journal Article
Title:: Synthesizing Talking Faces from Text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network. (June 2020)
Main Title:: Synthesizing Talking Faces from Text and Audio: An Autoencoder and Sequence-to-Sequence Convolutional Neural Network
Authors:: Liu, Na
Zhou, Tao
Ji, Yunfeng
Zhao, Ziyi
Wan, Lihong
Abstract:: Highlights: An effective landmark localization pipeline based on landmark detection, optical flow estimation, and Kalman filter, is proposed to avoid face shake. Part-based autoencoder is introduced to learn low-dimensional representation on different face regions. A sequence-to-sequence convolutional neural network with residual units is proposed to learn the mapping from phoneme to facial codes. The method is tested two public audio-visual datasets and a new dataset called Chinese CCTV News demonstrate the effectiveness of the proposed method against other state-of-the-art methods. Abstract: Synthesizing talking face from text and audio is increasingly becoming a direction in human-machine and face-to-face interactions. Although progress has been made, several existing methods either have unsatisfactory co-articulation modeling effects or ignore relations between adjacent inputs. Moreover, some of these methods often train models on shaky head videos or utilize linear-based face parameterization strategies, which further decrease synthesized quality. To address the above issues, this study proposes a sequence-to-sequence convolutional neural network to automatically synthesize talking face video with accurate lip sync. First, an advanced landmark location pipeline is used to accurately locate the facial landmarks, which can effectively reduce landmark shake. Then, a part-based autoencoder is presented to encode face images into a low-dimensional space and obtain compact … (more)
Is Part Of:: Pattern recognition. Volume 102(2020:Jun.)
Journal:: Pattern recognition
Issue:: Volume 102(2020:Jun.)
Issue Display:: Volume 102 (2020)
Year:: 2020
Volume:: 102
Issue Sort Value:: 2020-0102-0000-0000
Page Start:
Page End:
Publication Date:: 2020-06
Subjects:: Convolutional neural network -- Autoencoder -- Regression -- Face landmark -- Face tracking -- Lip sync -- Video -- Audio
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4
Journal URLs:: http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗
DOI:: 10.1016/j.patcog.2020.107231 ↗
Languages:: English
ISSNs:: 0031-3203
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 12955.xml