Pseudo-colored rate map representation for speech emotion recognition. (April 2021)
- Record Type:
- Journal Article
- Title:
- Pseudo-colored rate map representation for speech emotion recognition. (April 2021)
- Main Title:
- Pseudo-colored rate map representation for speech emotion recognition
- Authors:
- OZER, Ilyas
- Abstract:
- Highlights: To combine the spectro-temporal representation of the auditory nerve firing with pseudo-colouring. Log-power rate map representations are created. Focus on regions where values are stronger on theses representations. Classification performed with the convolutional neural network. The proposed approach provided a very good result on two separate data sets. Abstract: Speech emotion recognition (SER) is an exciting topic in the field of human-machine interaction. Several handcrafted features are used for SER. However, determining these features is both a difficult and time-consuming process. Instead, the use of features generated by convolutional neural networks (CNNs) with spectrograms and Mel-spectrograms has gained momentum in recent years. These CNNs are widely employed in image applications. Therefore, the audio signals must be represented in the best way as images. The spectrogram presents evenly spaced frequency components. However, spectral energy when mostly at low frequencies is not desirable. The Mel-filter provides benefits, but several studies have shown that its performance is inferior to biologically inspired models. In addition, the high variance between features negatively affects its classification performance. In this study, log-power rate map features are suggested as an auditory model for the SER task. In addition, we have proposed the use of a threshold function to focus on regions with high spectral energy. A rate map provides betterHighlights: To combine the spectro-temporal representation of the auditory nerve firing with pseudo-colouring. Log-power rate map representations are created. Focus on regions where values are stronger on theses representations. Classification performed with the convolutional neural network. The proposed approach provided a very good result on two separate data sets. Abstract: Speech emotion recognition (SER) is an exciting topic in the field of human-machine interaction. Several handcrafted features are used for SER. However, determining these features is both a difficult and time-consuming process. Instead, the use of features generated by convolutional neural networks (CNNs) with spectrograms and Mel-spectrograms has gained momentum in recent years. These CNNs are widely employed in image applications. Therefore, the audio signals must be represented in the best way as images. The spectrogram presents evenly spaced frequency components. However, spectral energy when mostly at low frequencies is not desirable. The Mel-filter provides benefits, but several studies have shown that its performance is inferior to biologically inspired models. In addition, the high variance between features negatively affects its classification performance. In this study, log-power rate map features are suggested as an auditory model for the SER task. In addition, we have proposed the use of a threshold function to focus on regions with high spectral energy. A rate map provides better resolution in the low-frequency region. In addition, smoothing reduces the variance between features, and focusing on spectral peaks reduces the effect of user-dependent features. The proposed approach was tested, independent of subject and gender, on the EMO-DB and EMOVO datasets, which are widely used in the literature. In the EMO-DB dataset, an increase of 2.42 % was achieved, with a classification performance of 91.32 %. In the EMOVO dataset, an increase of 4.95 % was achieved, with a classification performance of 68.93 %. … (more)
- Is Part Of:
- Biomedical signal processing and control. Volume 66(2021)
- Journal:
- Biomedical signal processing and control
- Issue:
- Volume 66(2021)
- Issue Display:
- Volume 66, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 66
- Issue:
- 2021
- Issue Sort Value:
- 2021-0066-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-04
- Subjects:
- Speech emotion recognition -- Rate map -- Convolutional neural network -- Pseudo-coloration
Signal processing -- Periodicals
Biomedical engineering -- Periodicals
Signal Processing, Computer-Assisted -- Periodicals
Image Processing, Computer-Assisted -- Periodicals
Biomedical Engineering -- Periodicals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/17468094 ↗
http://www.elsevier.com/journals ↗
http://www.sciencedirect.com/science?_ob=PublicationURL&_tockey=%23TOC%2329675%232006%23999989998%23626449%23FLA%23&_cdi=29675&_pubType=J&_auth=y&_acct=C000045259&_version=1&_urlVersion=0&_userid=836873&md5=664b5cf9a57fc91971a17faf20c32ec1 ↗ - DOI:
- 10.1016/j.bspc.2021.102502 ↗
- Languages:
- English
- ISSNs:
- 1746-8094
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2087.880400
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23779.xml