Acoustic and lexical representations for affect prediction in spontaneous conversations. (January 2015)
- Record Type:
- Journal Article
- Title:
- Acoustic and lexical representations for affect prediction in spontaneous conversations. (January 2015)
- Main Title:
- Acoustic and lexical representations for affect prediction in spontaneous conversations
- Authors:
- Cao, Houwei
Savran, Arman
Verma, Ragini
Nenkova, Ani - Abstract:
- Abstract : Highlights: Study a variety of representations of lexical usage and acoustics for continuous affect recognition in spontaneous speech. We perform acoustic analysis on different regions of interest (ROI) and show substantial improvements. Lexical feature with sparse representation using mutual information between words and dimension as weights is powerful. Our proposed acoustic and lexical representations are complementary and show potential improvement for fusion. On the AVEC 2012, our approach outperforms other participants and baselines by a large margin. Abstract: In this article we investigate what representations of acoustics and word usage are most suitable for predicting dimensions of affect—arousal, valance, power andexpectancy —in spontaneous interactions. Our experiments are based on the AVEC 2012 challenge dataset. For lexical representations, we compare corpus-independent features based on psychological word norms of emotional dimensions, as well as corpus-dependent representations. We find that corpus-dependent bag of words approach with mutual information between word and emotion dimensions is by far the best representation. For the analysis of acoustics, we zero in on the question of granularity. We confirm on our corpus that utterance-level features are more predictive than word-level features. Further, we study more detailed representations in which the utterance is divided into regions of interest (ROI), each with separate representation. WeAbstract : Highlights: Study a variety of representations of lexical usage and acoustics for continuous affect recognition in spontaneous speech. We perform acoustic analysis on different regions of interest (ROI) and show substantial improvements. Lexical feature with sparse representation using mutual information between words and dimension as weights is powerful. Our proposed acoustic and lexical representations are complementary and show potential improvement for fusion. On the AVEC 2012, our approach outperforms other participants and baselines by a large margin. Abstract: In this article we investigate what representations of acoustics and word usage are most suitable for predicting dimensions of affect—arousal, valance, power andexpectancy —in spontaneous interactions. Our experiments are based on the AVEC 2012 challenge dataset. For lexical representations, we compare corpus-independent features based on psychological word norms of emotional dimensions, as well as corpus-dependent representations. We find that corpus-dependent bag of words approach with mutual information between word and emotion dimensions is by far the best representation. For the analysis of acoustics, we zero in on the question of granularity. We confirm on our corpus that utterance-level features are more predictive than word-level features. Further, we study more detailed representations in which the utterance is divided into regions of interest (ROI), each with separate representation. We introduce two ROI representations, which significantly outperform less informed approaches. In addition we show that acoustic models of emotion can be improved considerably by taking into account annotator agreement and training the model on smaller but reliable dataset. Finally we discuss the potential for improving prediction by combining the lexical and acoustic modalities. Simple fusion methods do not lead to consistent improvements over lexical classifiers alone but improve over acoustic models. … (more)
- Is Part Of:
- Computer speech & language. Volume 29(2015)
- Journal:
- Computer speech & language
- Issue:
- Volume 29(2015)
- Issue Display:
- Volume 29, Issue 2015 (2015)
- Year:
- 2015
- Volume:
- 29
- Issue:
- 2015
- Issue Sort Value:
- 2015-0029-2015-0000
- Page Start:
- 203
- Page End:
- 217
- Publication Date:
- 2015-01
- Subjects:
- Emotion -- Affect -- Spontaneous speech -- Lexical features -- Acoustics
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2014.04.002 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 5426.xml