Automatic classification of speech overlaps: Feature representation and algorithms. (May 2019)
- Record Type:
- Journal Article
- Title:
- Automatic classification of speech overlaps: Feature representation and algorithms. (May 2019)
- Main Title:
- Automatic classification of speech overlaps: Feature representation and algorithms
- Authors:
- Chowdhury, Shammur Absar
Stepanov, Evgeny A.
Danieli, Morena
Riccardi, Giuseppe - Abstract:
- Highlights: The annotation of speech overlap categories over a large amount of ecological conversational data. Investigation of different lexical feature representation (n-grams, word-embeddings). Study the effect of acoustic and lexical feature combination techniques – feature space and hidden space combination. Comparative analysis using SVM, Feed-Forward Neural Network, Convolutional network, and Long-Short-Term-Memory. Modeled the concurrent speech information from both speakers for classification. Abstract: Overlapping speech is a natural and frequently occurring phenomenon in human–human conversations with an underlying purpose. Speech overlap events may be categorized as competitive and non-competitive. While the former is an attempt to grab the floor, the latter is an attempt to assist the speaker to continue the turn. The presence and distribution of these categories are indicative of the speakers' states during the conversation. Therefore, understanding these manifestations is crucial for conversational analysis and for modeling human–machine dialogs. The goal of this study is to design computational models to classify overlapping speech segments of dyadic conversations into competitive vs. non-competitive acts using lexical and acoustic cues, as well as their surrounding context. The designed overlap representations are evaluated in both linear – Support Vector Machines (SVM) – and non-linear – feed-forward (FFNN), convolutional (CNN) and long short-term memoryHighlights: The annotation of speech overlap categories over a large amount of ecological conversational data. Investigation of different lexical feature representation (n-grams, word-embeddings). Study the effect of acoustic and lexical feature combination techniques – feature space and hidden space combination. Comparative analysis using SVM, Feed-Forward Neural Network, Convolutional network, and Long-Short-Term-Memory. Modeled the concurrent speech information from both speakers for classification. Abstract: Overlapping speech is a natural and frequently occurring phenomenon in human–human conversations with an underlying purpose. Speech overlap events may be categorized as competitive and non-competitive. While the former is an attempt to grab the floor, the latter is an attempt to assist the speaker to continue the turn. The presence and distribution of these categories are indicative of the speakers' states during the conversation. Therefore, understanding these manifestations is crucial for conversational analysis and for modeling human–machine dialogs. The goal of this study is to design computational models to classify overlapping speech segments of dyadic conversations into competitive vs. non-competitive acts using lexical and acoustic cues, as well as their surrounding context. The designed overlap representations are evaluated in both linear – Support Vector Machines (SVM) – and non-linear – feed-forward (FFNN), convolutional (CNN) and long short-term memory (LSTM) neural network – models. We experiment with lexical and acoustic representations and their combinations from both speaker channels in feature and hidden space. We observe that lexical word-embedding features significantly increase the overall F 1 -measure compared to both acoustic and bag-of-ngrams lexical representations, suggesting that lexical information can be utilized as a powerful cue for overlap classification. Our comparative study shows that the best computational architecture is an FFNN along with a combination of word embeddings and acoustic features. … (more)
- Is Part Of:
- Computer speech & language. Volume 55(2019)
- Journal:
- Computer speech & language
- Issue:
- Volume 55(2019)
- Issue Display:
- Volume 55, Issue 2019 (2019)
- Year:
- 2019
- Volume:
- 55
- Issue:
- 2019
- Issue Sort Value:
- 2019-0055-2019-0000
- Page Start:
- 145
- Page End:
- 167
- Publication Date:
- 2019-05
- Subjects:
- Overlap -- Acoustic -- Lexical -- Deep learning -- Spoken conversation
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2018.12.001 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10143.xml