Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories. (March 2016)
- Record Type:
- Journal Article
- Title:
- Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories. (March 2016)
- Main Title:
- Directly data-derived articulatory gesture-like representations retain discriminatory information about phone categories
- Authors:
- Ramanarayanan, Vikram
Van Segbroeck, Maarten
Narayanan, Shrikanth S. - Abstract:
- Abstract : Highlights: Proposed a method to extract sparse gesture-like primitives from articulatory data. Learnt primitive movements for different phonemes using a weak supervision step. Demonstrated that primitives for different phones are linguistically interpretable. Proposed and evaluated features on an interval-based phone classification task. Showed that purely production-based primitives perform well for phone classification. Abstract: How the speech production and perception systems evolved in humans still remains a mystery today. Previous research suggests that human auditory systems are able, and have possibly evolved, to preserve maximal information about the speaker's articulatory gestures. This paper attempts an initial step toward answering the complementary question of whether speakers' articulatory mechanisms have also evolved to produce sounds that can be optimally discriminated by the listener's auditory system. To this end we explicitly model, using computational methods, the extent to which derived representations of "primitive movements" of speech articulation can be used to discriminate between broad phone categories. We extract interpretable spatio-temporal primitive movements as recurring patterns in a data matrix of human speech articulation, i.e., representing the trajectories of vocal tract articulators over time. To this end, we propose a weakly-supervised learning method that attempts to find a part-based representation of the data in terms ofAbstract : Highlights: Proposed a method to extract sparse gesture-like primitives from articulatory data. Learnt primitive movements for different phonemes using a weak supervision step. Demonstrated that primitives for different phones are linguistically interpretable. Proposed and evaluated features on an interval-based phone classification task. Showed that purely production-based primitives perform well for phone classification. Abstract: How the speech production and perception systems evolved in humans still remains a mystery today. Previous research suggests that human auditory systems are able, and have possibly evolved, to preserve maximal information about the speaker's articulatory gestures. This paper attempts an initial step toward answering the complementary question of whether speakers' articulatory mechanisms have also evolved to produce sounds that can be optimally discriminated by the listener's auditory system. To this end we explicitly model, using computational methods, the extent to which derived representations of "primitive movements" of speech articulation can be used to discriminate between broad phone categories. We extract interpretable spatio-temporal primitive movements as recurring patterns in a data matrix of human speech articulation, i.e., representing the trajectories of vocal tract articulators over time. To this end, we propose a weakly-supervised learning method that attempts to find a part-based representation of the data in terms of recurring basis trajectory units (or primitives) and their corresponding activations over time. For each phone interval, we then derive a feature representation that captures the co-occurrences between the activations of the various bases over different time-lags. We show that this feature, derived entirely from activations of these primitive movements, is able to achieve a greater discrimination relative to using conventional features on an interval-based phone classification task. We discuss the implications of these findings in furthering our understanding of speech signal representations and the links between speech production and perception systems. … (more)
- Is Part Of:
- Computer speech & language. Volume 36(2016)
- Journal:
- Computer speech & language
- Issue:
- Volume 36(2016)
- Issue Display:
- Volume 36, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 36
- Issue:
- 2016
- Issue Sort Value:
- 2016-0036-2016-0000
- Page Start:
- 330
- Page End:
- 346
- Publication Date:
- 2016-03
- Subjects:
- Speech communication -- Movement primitives -- Phone classification -- Motor theory -- Information transfer
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2015.03.004 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 528.xml