Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech. (July 2021)
- Record Type:
- Journal Article
- Title:
- Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech. (July 2021)
- Main Title:
- Representation transfer learning from deep end-to-end speech recognition networks for the classification of health states from speech
- Authors:
- Sertolli, Benjamin
Ren, Zhao
Schuller, Björn W.
Cummins, Nicholas - Abstract:
- Highlights: Use of pre-trained automatic speech recognition models for transfer learning. Compact Bilinear Pooling as a fusion technique for computational paralinguistics. Presentation of comprehensive results on two speech-health classification tasks. Demonstrate comparable results to other state-of-the-art approaches. Abstract: Representation transfer learning has been widely used across a range of machine learning tasks. One such notable approach seen in the speech literature is the use of Convolutional Neural Networks, pre-trained for image classification tasks, to extract features from spectrograms of speech signals. Interestingly, despite the strong performance of such approaches, there have been minimal research efforts exploring the suitability of using speech-specific networks to perform feature extraction. In this regard, a novel feature representation learning framework is presented herein. This approach is comprising the use of Automatic Speech Recognition (ASR) deep neural networks as feature extractors, the fusion of several extracted feature representations using Compact Bilinear Pooling (CBP), and finally inference via a specially optimised Recurrent Neural Network (RNN) classifier. To determine the usefulness of these feature representations, they are comprehensively tested on two representative speech-health classification tasks, namely the food-type being eaten and speaker intoxication. Key results indicate the promise of the extracted features,Highlights: Use of pre-trained automatic speech recognition models for transfer learning. Compact Bilinear Pooling as a fusion technique for computational paralinguistics. Presentation of comprehensive results on two speech-health classification tasks. Demonstrate comparable results to other state-of-the-art approaches. Abstract: Representation transfer learning has been widely used across a range of machine learning tasks. One such notable approach seen in the speech literature is the use of Convolutional Neural Networks, pre-trained for image classification tasks, to extract features from spectrograms of speech signals. Interestingly, despite the strong performance of such approaches, there have been minimal research efforts exploring the suitability of using speech-specific networks to perform feature extraction. In this regard, a novel feature representation learning framework is presented herein. This approach is comprising the use of Automatic Speech Recognition (ASR) deep neural networks as feature extractors, the fusion of several extracted feature representations using Compact Bilinear Pooling (CBP), and finally inference via a specially optimised Recurrent Neural Network (RNN) classifier. To determine the usefulness of these feature representations, they are comprehensively tested on two representative speech-health classification tasks, namely the food-type being eaten and speaker intoxication. Key results indicate the promise of the extracted features, demonstrating comparable results to other state-of-the-art approaches in the literature. … (more)
- Is Part Of:
- Computer speech & language. Volume 68(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 68(2021)
- Issue Display:
- Volume 68, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 68
- Issue:
- 2021
- Issue Sort Value:
- 2021-0068-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-07
- Subjects:
- Transfer learning -- Representation learning -- Automatic speech recognition -- Compact bilinear pooling -- Computational paralinguistics -- Recurrent neural networks
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2021.101204 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16008.xml