Speech emotion recognition with deep convolutional neural networks. (May 2020)
- Record Type:
- Journal Article
- Title:
- Speech emotion recognition with deep convolutional neural networks. (May 2020)
- Main Title:
- Speech emotion recognition with deep convolutional neural networks
- Authors:
- Issa, Dias
Fatih Demirci, M.
Yazici, Adnan - Abstract:
- Highlights: Sound files are represented effectively by combining various features. The framework sets the new SOTA on two datasets for speech emotion recognition. For the third dataset (EMO-DB), the framework obtains the second highest accuracy. The advantages of the framework are its simplicity, applicability, and generality. Abstract: The speech emotion recognition (or, classification) is one of the most challenging topics in data science. In this work, we introduce a new architecture, which extracts mel-frequency cepstral coefficients, chromagram, mel-scale spectrogram, Tonnetz representation, and spectral contrast features from sound files and uses them as inputs for the one-dimensional Convolutional Neural Network for the identification of emotions using samples from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Berlin (EMO-DB), and Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets. We utilize an incremental method for modifying our initial model in order to improve classification accuracy. All of the proposed models work directly with raw sound data without the need for conversion to visual representations, unlike some previous approaches. Based on experimental results, our best-performing model outperforms existing frameworks for RAVDESS and IEMOCAP, thus setting the new state-of-the-art. For the EMO-DB dataset, it outperforms all previous works except one but compares favorably with that one in terms of generality, simplicity,Highlights: Sound files are represented effectively by combining various features. The framework sets the new SOTA on two datasets for speech emotion recognition. For the third dataset (EMO-DB), the framework obtains the second highest accuracy. The advantages of the framework are its simplicity, applicability, and generality. Abstract: The speech emotion recognition (or, classification) is one of the most challenging topics in data science. In this work, we introduce a new architecture, which extracts mel-frequency cepstral coefficients, chromagram, mel-scale spectrogram, Tonnetz representation, and spectral contrast features from sound files and uses them as inputs for the one-dimensional Convolutional Neural Network for the identification of emotions using samples from the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), Berlin (EMO-DB), and Interactive Emotional Dyadic Motion Capture (IEMOCAP) datasets. We utilize an incremental method for modifying our initial model in order to improve classification accuracy. All of the proposed models work directly with raw sound data without the need for conversion to visual representations, unlike some previous approaches. Based on experimental results, our best-performing model outperforms existing frameworks for RAVDESS and IEMOCAP, thus setting the new state-of-the-art. For the EMO-DB dataset, it outperforms all previous works except one but compares favorably with that one in terms of generality, simplicity, and applicability. Specifically, the proposed framework obtains 71.61% for RAVDESS with 8 classes, 86.1% for EMO-DB with 535 samples in 7 classes, 95.71% for EMO-DB with 520 samples in 7 classes, and 64.3% for IEMOCAP with 4 classes in speaker-independent audio classification tasks. … (more)
- Is Part Of:
- Biomedical signal processing and control. Volume 59(2020)
- Journal:
- Biomedical signal processing and control
- Issue:
- Volume 59(2020)
- Issue Display:
- Volume 59, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 59
- Issue:
- 2020
- Issue Sort Value:
- 2020-0059-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-05
- Subjects:
- Speech emotion recognition -- Deep learning -- Signal processing
Signal processing -- Periodicals
Biomedical engineering -- Periodicals
Signal Processing, Computer-Assisted -- Periodicals
Image Processing, Computer-Assisted -- Periodicals
Biomedical Engineering -- Periodicals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/17468094 ↗
http://www.elsevier.com/journals ↗
http://www.sciencedirect.com/science?_ob=PublicationURL&_tockey=%23TOC%2329675%232006%23999989998%23626449%23FLA%23&_cdi=29675&_pubType=J&_auth=y&_acct=C000045259&_version=1&_urlVersion=0&_userid=836873&md5=664b5cf9a57fc91971a17faf20c32ec1 ↗ - DOI:
- 10.1016/j.bspc.2020.101894 ↗
- Languages:
- English
- ISSNs:
- 1746-8094
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 2087.880400
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13502.xml