The impact of speaking rate on acoustic-to-articulatory inversion. (January 2020)
- Record Type:
- Journal Article
- Title:
- The impact of speaking rate on acoustic-to-articulatory inversion. (January 2020)
- Main Title:
- The impact of speaking rate on acoustic-to-articulatory inversion
- Authors:
- Illa, Aravind
Ghosh, Prasanta Kumar - Abstract:
- Abstract: Acoustic characteristics and articulatory movements are known to vary with speaking rates. This study investigates the role of speaking rate on acoustic-to-articulatory inversion (AAI) performance using deep neural networks (DNNs). Since fast speaking rate causes fast articulatory motion as well as changes in spectro-temporal characteristics of the speech signal, the articulatory-acoustic map in a fast speaking rate could be different from that in a slow speaking rate. We examine how these differences alter the accuracy with which different articulatory positions could be recovered from the acoustics. AAI experiments are performed in both matched and mismatched train-test conditions using data of five subjects, in three different rates – normal, fast and slow (fast and slow rates are at least 1.3 times faster and slower than the normal rate). Experiments in matched cases reveal that, the errors in estimating vertical motion of sensors on the tongue articulators from acoustics with fast speaking rate, is significantly higher than those with slow speaking rate. Experiments in mis-matched conditions reveal that there is consistent drop in AAI performance compared to the matched condition. Further experiments performed by training AAI with acoustic-articulatory data pooled from different speaking rates reveal that a single DNN based AAI model is capable of learning multiple rate-specific mapping. Highlights: We carry out experimental study to systematically investigateAbstract: Acoustic characteristics and articulatory movements are known to vary with speaking rates. This study investigates the role of speaking rate on acoustic-to-articulatory inversion (AAI) performance using deep neural networks (DNNs). Since fast speaking rate causes fast articulatory motion as well as changes in spectro-temporal characteristics of the speech signal, the articulatory-acoustic map in a fast speaking rate could be different from that in a slow speaking rate. We examine how these differences alter the accuracy with which different articulatory positions could be recovered from the acoustics. AAI experiments are performed in both matched and mismatched train-test conditions using data of five subjects, in three different rates – normal, fast and slow (fast and slow rates are at least 1.3 times faster and slower than the normal rate). Experiments in matched cases reveal that, the errors in estimating vertical motion of sensors on the tongue articulators from acoustics with fast speaking rate, is significantly higher than those with slow speaking rate. Experiments in mis-matched conditions reveal that there is consistent drop in AAI performance compared to the matched condition. Further experiments performed by training AAI with acoustic-articulatory data pooled from different speaking rates reveal that a single DNN based AAI model is capable of learning multiple rate-specific mapping. Highlights: We carry out experimental study to systematically investigate how the AAI performance changes in different speaking rates. For this purpose acoustic-articulatory data is collected from 5 subjects at different speaking rates (slow, normal and fast) using EMA. Analysis of acoustic-to-articulatory inversion (AAI) using information theoretic approach. Experimental validation of performance of AAI using deep neural networks. Experimental results are discussed with different train-test conditions: (a) Matched (b) Mismatched (c) Generic AAI. … (more)
- Is Part Of:
- Computer speech & language. Volume 59(2020)
- Journal:
- Computer speech & language
- Issue:
- Volume 59(2020)
- Issue Display:
- Volume 59, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 59
- Issue:
- 2020
- Issue Sort Value:
- 2020-0059-2020-0000
- Page Start:
- 75
- Page End:
- 90
- Publication Date:
- 2020-01
- Subjects:
- Acoustic-to-articulatory inversion -- Speaking rate -- Electromagnetic articulograph
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2019.05.004 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11888.xml