Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech. (August 2021)
- Record Type:
- Journal Article
- Title:
- Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech. (August 2021)
- Main Title:
- Multi-resolution modulation-filtered cochleagram feature for LSTM-based dimensional emotion recognition from speech
- Authors:
- Peng, Zhichao
Dang, Jianwu
Unoki, Masashi
Akagi, Masato - Abstract:
- Abstract: Continuous dimensional emotion recognition from speech helps robots or virtual agents capture the temporal dynamics of a speaker's emotional state in natural human–robot interactions. Temporal modulation cues obtained directly from the time-domain model of auditory perception can better reflect temporal dynamics than the acoustic features usually processed in the frequency domain. Feature extraction, which can reflect temporal dynamics of emotion from temporal modulation cues, is challenging because of the complexity and diversity of the auditory perception model. A recent neuroscientific study suggests that human brains derive multi-resolution representations through temporal modulation analysis. This study investigates multi-resolution representations of an auditory perception model and proposes a novel feature called multi-resolution modulation-filtered cochleagram (MMCG) for predicting valence and arousal values of emotional primitives. The MMCG is constructed by combining four modulation-filtered cochleagrams at different resolutions to capture various temporal and contextual modulation information. In addition, to model the multi-temporal dependencies of the MMCG, we designed a parallel long short-term memory (LSTM) architecture. The results of extensive experiments on the RECOLA and SEWA datasets demonstrate that MMCG provides the best recognition performance in both datasets among all evaluated features. The results also show that the parallel LSTM canAbstract: Continuous dimensional emotion recognition from speech helps robots or virtual agents capture the temporal dynamics of a speaker's emotional state in natural human–robot interactions. Temporal modulation cues obtained directly from the time-domain model of auditory perception can better reflect temporal dynamics than the acoustic features usually processed in the frequency domain. Feature extraction, which can reflect temporal dynamics of emotion from temporal modulation cues, is challenging because of the complexity and diversity of the auditory perception model. A recent neuroscientific study suggests that human brains derive multi-resolution representations through temporal modulation analysis. This study investigates multi-resolution representations of an auditory perception model and proposes a novel feature called multi-resolution modulation-filtered cochleagram (MMCG) for predicting valence and arousal values of emotional primitives. The MMCG is constructed by combining four modulation-filtered cochleagrams at different resolutions to capture various temporal and contextual modulation information. In addition, to model the multi-temporal dependencies of the MMCG, we designed a parallel long short-term memory (LSTM) architecture. The results of extensive experiments on the RECOLA and SEWA datasets demonstrate that MMCG provides the best recognition performance in both datasets among all evaluated features. The results also show that the parallel LSTM can build multi-temporal dependencies from the MMCG features, and the performance on valence and arousal prediction is better than that of a plain LSTM method. … (more)
- Is Part Of:
- Neural networks. Volume 140(2021)
- Journal:
- Neural networks
- Issue:
- Volume 140(2021)
- Issue Display:
- Volume 140, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 140
- Issue:
- 2021
- Issue Sort Value:
- 2021-0140-2021-0000
- Page Start:
- 261
- Page End:
- 273
- Publication Date:
- 2021-08
- Subjects:
- Dimensional emotion -- Temporal modulation -- Multi-resolution modulation-filtered cochleagram -- Parallel long short-term memory network
Neural computers -- Periodicals
Neural networks (Computer science) -- Periodicals
Neural networks (Neurobiology) -- Periodicals
Nervous System -- Periodicals
Ordinateurs neuronaux -- Périodiques
Réseaux neuronaux (Informatique) -- Périodiques
Réseaux neuronaux (Neurobiologie) -- Périodiques
Neural computers
Neural networks (Computer science)
Neural networks (Neurobiology)
Periodicals
006.32 - Journal URLs:
- http://www.sciencedirect.com/science/journal/08936080 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.neunet.2021.03.027 ↗
- Languages:
- English
- ISSNs:
- 0893-6080
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 6081.280800
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22548.xml