Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation. (15th December 2020)
- Record Type:
- Journal Article
- Title:
- Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation. (15th December 2020)
- Main Title:
- Investigation of multilingual and mixed-lingual emotion recognition using enhanced cues with data augmentation
- Authors:
- Lalitha, S.
Gupta, Deepa
Zakariah, Mohammed
Alotaibi, Yousef Ajami - Abstract:
- Abstract: In the past decade, research for improving man–machine communication has focused on emotion recognition using audio cues. Several effective monolingual, multilingual, and cross-corpus speech emotion recognition (SER) systems have been developed; however, they are limited to recognizing emotions from databases of monolingual discourse, primarily in either categorical or dimensional emotion space. For multilingual countries and federations such as India, Russia, and the European Union, these limitations can be problematic. Furthermore, in an environment of mixed diversified languages, the performance of existing models is unclear. To address these issues, we propose an innovative mixed-lingual SER system that considers five diverse languages, including dialect variability. Mixed-lingual corpora are developed from available standard speech emotion databases. Furthermore, a compact feature set having a unique set of speech feature functionals with a distinctive set of enhanced perceptual features and modified H-coefficients is proposed. Against existing large feature sets, the proposed compact feature set is robust and effective to perform the dual task of significantly recognizing different emotions from multilingual SER systems also, along with the mixed-lingual SER systems in both emotion spaces of categorical and dimensional. In the proposed SER system, to overcome the skewness of SER system performance for recognizing certain emotions, a data augmentation methodAbstract: In the past decade, research for improving man–machine communication has focused on emotion recognition using audio cues. Several effective monolingual, multilingual, and cross-corpus speech emotion recognition (SER) systems have been developed; however, they are limited to recognizing emotions from databases of monolingual discourse, primarily in either categorical or dimensional emotion space. For multilingual countries and federations such as India, Russia, and the European Union, these limitations can be problematic. Furthermore, in an environment of mixed diversified languages, the performance of existing models is unclear. To address these issues, we propose an innovative mixed-lingual SER system that considers five diverse languages, including dialect variability. Mixed-lingual corpora are developed from available standard speech emotion databases. Furthermore, a compact feature set having a unique set of speech feature functionals with a distinctive set of enhanced perceptual features and modified H-coefficients is proposed. Against existing large feature sets, the proposed compact feature set is robust and effective to perform the dual task of significantly recognizing different emotions from multilingual SER systems also, along with the mixed-lingual SER systems in both emotion spaces of categorical and dimensional. In the proposed SER system, to overcome the skewness of SER system performance for recognizing certain emotions, a data augmentation method is then incorporated. Furthermore, the proposed SER system is designed to efficiently recognize even the extreme emotions of boredom, disgust, sadness, and surprise in both emotion spaces. The proposed SER system is compared to existing SER systems, and the comparison results demonstrate that the proposed system outperformed existing systems. … (more)
- Is Part Of:
- Applied acoustics. Volume 170(2020)
- Journal:
- Applied acoustics
- Issue:
- Volume 170(2020)
- Issue Display:
- Volume 170, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 170
- Issue:
- 2020
- Issue Sort Value:
- 2020-0170-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-12-15
- Subjects:
- Arousal -- Categorical -- Cross-validation -- Data augmentation -- Emotion -- Mel frequency -- Multilingual -- Mixed-lingual -- SMOTE -- Valence
Acoustical engineering -- Periodicals
Periodicals
620.2 - Journal URLs:
- http://www.sciencedirect.com/science/journal/0003682X ↗
http://www.elsevier.com/journals ↗
http://www.elsevier.com/homepage/elecserv.htt ↗ - DOI:
- 10.1016/j.apacoust.2020.107519 ↗
- Languages:
- English
- ISSNs:
- 0003-682X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 1571.400000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13930.xml