Detecting dementia from speech and transcripts using transformers. (April 2023)
- Record Type:
- Journal Article
- Title:
- Detecting dementia from speech and transcripts using transformers. (April 2023)
- Main Title:
- Detecting dementia from speech and transcripts using transformers
- Authors:
- Ilias, Loukas
Askounis, Dimitris
Psarras, John - Abstract:
- Abstract: Alzheimer's disease (AD) constitutes a neurodegenerative disease with serious consequences to peoples' everyday lives, if it is not diagnosed early since there is no available cure. Alzheimer's is the most common cause of dementia, which constitutes a general term for loss of memory. Due to the fact that dementia affects speech, existing research initiatives focus on detecting dementia from spontaneous speech. However, little work has been done regarding the conversion of speech data to Log-Mel spectrograms and Mel-frequency cepstral coefficients (MFCCs) and the usage of pretrained models. Concurrently, little work has been done in terms of both the usage of transformer networks and the way the two modalities, i.e., speech and transcripts, are combined in a single neural network. To address these limitations, first we represent speech signal as an image and employ several pretrained models, with Vision Transformer (ViT) achieving the highest evaluation results. Secondly, we propose multimodal models. More specifically, our introduced models include Gated Multimodal Unit in order to control the influence of each modality towards the final classification and crossmodal attention so as to capture in an effective way the relationships between the two modalities. Extensive experiments conducted on the ADReSS Challenge dataset demonstrate the effectiveness of the proposed models and their superiority over state-of-the-art approaches. Highlights: We convert audio toAbstract: Alzheimer's disease (AD) constitutes a neurodegenerative disease with serious consequences to peoples' everyday lives, if it is not diagnosed early since there is no available cure. Alzheimer's is the most common cause of dementia, which constitutes a general term for loss of memory. Due to the fact that dementia affects speech, existing research initiatives focus on detecting dementia from spontaneous speech. However, little work has been done regarding the conversion of speech data to Log-Mel spectrograms and Mel-frequency cepstral coefficients (MFCCs) and the usage of pretrained models. Concurrently, little work has been done in terms of both the usage of transformer networks and the way the two modalities, i.e., speech and transcripts, are combined in a single neural network. To address these limitations, first we represent speech signal as an image and employ several pretrained models, with Vision Transformer (ViT) achieving the highest evaluation results. Secondly, we propose multimodal models. More specifically, our introduced models include Gated Multimodal Unit in order to control the influence of each modality towards the final classification and crossmodal attention so as to capture in an effective way the relationships between the two modalities. Extensive experiments conducted on the ADReSS Challenge dataset demonstrate the effectiveness of the proposed models and their superiority over state-of-the-art approaches. Highlights: We convert audio to log-Mel spectrograms (and MFCCs), their delta, and delta-delta. We employ several pretrained models on the domain of computer vision. We propose multimodal deep learning models to detect AD patients. We introduce a multimodal gate mechanism. We introduce crossmodal attention. … (more)
- Is Part Of:
- Computer speech & language. Volume 79(2023)
- Journal:
- Computer speech & language
- Issue:
- Volume 79(2023)
- Issue Display:
- Volume 79, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 79
- Issue:
- 2023
- Issue Sort Value:
- 2023-0079-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-04
- Subjects:
- Dementia -- Speech -- log-Mel spectrogram -- Mel-frequency cepstral coefficients -- Vision transformer -- Gated multimodal unit -- Crossmodal attention
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2023.101485 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25994.xml