Multi-cultural speech emotion recognition using language and speaker cues. (May 2023)

Record Type:: Journal Article
Title:: Multi-cultural speech emotion recognition using language and speaker cues. (May 2023)
Main Title:: Multi-cultural speech emotion recognition using language and speaker cues
Authors:: Pandey, Sandeep Kumar
Shekhawat, Hanumant Singh
Prasanna, S.R.M.
Abstract:: Abstract: Speech Emotion Recognition (SER) has been an active area of research to make Human–Computer Interaction (HCI) smoother and more natural. However, due to the dependence of the expressed emotions in an utterance on factors like culture, speaker, etc., the robustness of the SER systems in a multi-cultural setting is always a topic of discussion among researchers. Both the universalness and cultural specificity of emotions are debated in the literature. Thus we propose two methods, one incorporating cultural specificity and another demonstrating the universal nature of emotions across cultures. In this work, we propose a novel method to make a multi-cultural SER by incorporating impactful factors such as speaker and language as markers of cultural distinctiveness. We develop a language and a speaker model to get language and speaker embeddings, and a multi-modal fusion architecture is proposed to fuse the information along with emotional cues. Moreover, a triplet-loss-based multi-cultural SER is proposed, which tries to normalize speaker and cultural variabilities and focuses on learning emotions, irrespective of culture. Experiments conducted on a collection of five language emotion dataset shows the robustness of the proposed technique in predicting emotions in a leave-one-language-out setting. The design of the triplet loss-based system allows for the incorporation of a new language and speaker without the need to retrain the whole system again. Highlights: We … (more)
Is Part Of:: Biomedical signal processing and control. Volume 83(2023)
Journal:: Biomedical signal processing and control
Issue:: Volume 83(2023)
Issue Display:: Volume 83, Issue 2023 (2023)
Year:: 2023
Volume:: 83
Issue:: 2023
Issue Sort Value:: 2023-0083-2023-0000
Page Start:
Page End:
Publication Date:: 2023-05
Subjects:: Tensor factorized neural network -- Speech emotion recognition -- Multi-cultural -- Multi-modal -- Language model -- Speaker model -- Metric learning -- Triplet loss
Signal processing -- Periodicals
Biomedical engineering -- Periodicals
Signal Processing, Computer-Assisted -- Periodicals
Image Processing, Computer-Assisted -- Periodicals
Biomedical Engineering -- Periodicals
610.28
Journal URLs:: http://www.sciencedirect.com/science/journal/17468094 ↗
http://www.elsevier.com/journals ↗
http://www.sciencedirect.com/science?_ob=PublicationURL&_tockey=%23TOC%2329675%232006%23999989998%23626449%23FLA%23&_cdi=29675&_pubType=J&_auth=y&_acct=C000045259&_version=1&_urlVersion=0&_userid=836873&md5=664b5cf9a57fc91971a17faf20c32ec1 ↗
DOI:: 10.1016/j.bspc.2023.104679 ↗
Languages:: English
ISSNs:: 1746-8094
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 2087.880400
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 26143.xml