Data driven articulatory synthesis with deep neural networks. (March 2016)

Record Type:: Journal Article
Title:: Data driven articulatory synthesis with deep neural networks. (March 2016)
Main Title:: Data driven articulatory synthesis with deep neural networks
Authors:: Aryal, Sandesh
Gutierrez-Osuna, Ricardo
Abstract:: Highlights: We present an articulatory-to-acoustic mapping for real-time articulatory synthesis. The method uses a deep neural network with a tapped-delay input line. Tapped-delay line efficiently captures dynamics in articulatory trajectories. The model achieved higher accuracy than competing models based on Gaussian mixtures. The improvement was also found perceivable in a subjective listening test. Abstract: The conventional approach for data-driven articulatory synthesis consists of modeling the joint acoustic-articulatory distribution with a Gaussian mixture model (GMM), followed by a post-processing step that optimizes the resulting acoustic trajectories. This final step can significantly improve the accuracy of the GMM frame-by-frame mapping but is computationally intensive and requires that the entire utterance be synthesized beforehand, making it unsuited for real-time synthesis. To address this issue, we present a deep neural network (DNN) articulatory synthesizer that uses a tapped-delay input line, allowing the model to capture context information in the articulatory trajectory without the need for post-processing. We characterize the DNN as a function of the context size and number of hidden layers, and compare it against two GMM articulatory synthesizers, a baseline model that performs a simple frame-by-frame mapping, and a second model that also performs trajectory optimization. Our results show that a DNN with a 60-ms context window and two 512-neuron hidden … (more)
Is Part Of:: Computer speech & language. Volume 36(2016)
Journal:: Computer speech & language
Issue:: Volume 36(2016)
Issue Display:: Volume 36, Issue 2016 (2016)
Year:: 2016
Volume:: 36
Issue:: 2016
Issue Sort Value:: 2016-0036-2016-0000
Page Start:: 260
Page End:: 273
Publication Date:: 2016-03
Subjects:: Articulatory synthesis -- Electromagnetic articulography -- Deep learning -- Gaussian mixture models
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454
Journal URLs:: http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.csl.2015.02.003 ↗
Languages:: English
ISSNs:: 0885-2308
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 528.xml