Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. (January 2021)
- Record Type:
- Journal Article
- Title:
- Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. (January 2021)
- Main Title:
- Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features
- Authors:
- Narendra, N.P.
Alku, Paavo - Abstract:
- Highlights: The proposed method predicts four levels of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. In addition to the glottal features, the openSMILE features are used as acoustic baseline features. Multiclass-support vector machine (SVM) classifiers are trained using features obtained from coded speech utterances and the corresponding intelligibility level labels. Coded telephone speech is generated with the adaptive multi-rate codec with two operational bandwidths (narrowband and wideband). Experimental results showed good classification accuracy for the glottal features, indicating their effectiveness in the intelligibility level assessment in speakers with dysarthria. Abstract: In clinical practice, assessment of intelligibility in speakers with dysarthria is performed by speech-language pathologists through auditory perceptual tests which demand patients' presence at hospital and involve time-consuming examinations. Frequent clinical monitoring can be costly and logistically inconvenient both for patients and medical experts. Here, we aim to automate the procedure of assessment of intelligibility in dysarthric speakers with an objective, speech-based method that can be employed in a telescreening application. The proposed method predicts the level of intelligibility in dysarthric speakers using four levels of speech intelligibility (very low, low, mediocre and high). The study compares several automatic methods toHighlights: The proposed method predicts four levels of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. In addition to the glottal features, the openSMILE features are used as acoustic baseline features. Multiclass-support vector machine (SVM) classifiers are trained using features obtained from coded speech utterances and the corresponding intelligibility level labels. Coded telephone speech is generated with the adaptive multi-rate codec with two operational bandwidths (narrowband and wideband). Experimental results showed good classification accuracy for the glottal features, indicating their effectiveness in the intelligibility level assessment in speakers with dysarthria. Abstract: In clinical practice, assessment of intelligibility in speakers with dysarthria is performed by speech-language pathologists through auditory perceptual tests which demand patients' presence at hospital and involve time-consuming examinations. Frequent clinical monitoring can be costly and logistically inconvenient both for patients and medical experts. Here, we aim to automate the procedure of assessment of intelligibility in dysarthric speakers with an objective, speech-based method that can be employed in a telescreening application. The proposed method predicts the level of intelligibility in dysarthric speakers using four levels of speech intelligibility (very low, low, mediocre and high). The study compares several automatic methods to assess the intelligibility level in speakers with dysarthria by utilizing information generated at the level of the vocal folds through glottal features and by using coded telephone speech (i.e. speech that is used in telescreening applications). In addition to the glottal features, the openSMILE features are used as acoustic baseline features. Using features obtained from coded speech utterances and the corresponding intelligibility level labels, multiclass-support vector machine (SVM) classifiers are trained. A separate set of multiclass-SVMs are trained using both individual glottal and acoustic features as well as their combinations. Coded telephone speech is generated with the adaptive multi-rate codec with two operational bandwidths (narrowband and wideband), from utterances of an open database of dysarthric speech (Universal Access-Speech). Experimental results showed good classification accuracies for the glottal features, indicating their effectiveness in the intelligibility level assessment in speakers with dysarthria even in the challenging coded condition. Improvement in classification accuracy was obtained when the glottal features were combined with the openSMILE acoustic features, which validate the complimentary nature of the glottal features. … (more)
- Is Part Of:
- Computer speech & language. Volume 65(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 65(2021)
- Issue Display:
- Volume 65, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 65
- Issue:
- 2021
- Issue Sort Value:
- 2021-0065-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-01
- Subjects:
- Dysarthric speech -- Glottal features -- Glottal inverse filtering -- Glottal source estimation -- Opensmile -- Support vector machine -- Coded telephone speech
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101117 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16886.xml