Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients. (March 2022)
- Record Type:
- Journal Article
- Title:
- Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients. (March 2022)
- Main Title:
- Phonetic posteriorgram-based voice conversion system to improve speech intelligibility of dysarthric patients
- Authors:
- Zheng, Wei-Zhong
Han, Ji-Yan
Lee, Chen-Kai
Lin, Yu-Yi
Chang, Shu-Han
Lai, Ying-Hui - Abstract:
- Highlights: Proposed a dysarthria voice conversion system that can effectively improve dysarthria speech intelligibility. The proposed system achieves higher speech intelligibility performance than the baseline system in challenging situations. The proposed system has achieved functional intelligibility for listeners. Abstract: Background and Objective: Most dysarthric patients encounter communication problems due to unintelligible speech. Currently, there are many voice-driven systems aimed at improving their speech intelligibility; however, the intelligibility performance of these systems are affected by challenging application conditions (e.g., time variance of patient's speech and background noise). To alleviate these problems, we proposed a dysarthria voice conversion (DVC) system for dysarthric patients and investigated the benefits under challenging application conditions. Method: A deep learning-based voice conversion system with phonetic posteriorgram (PPG) features, called the DVC-PPG system, was proposed in this study. An objective-evaluation metric of Google automatic speech recognition (Google ASR) system and a listening test were used to demonstrate the speech intelligibility benefits of DVC-PPG under quiet and noisy test conditions; besides, the well-known voice conversion system using mel-spectrogram, DVC-Mels, was used for comparison to verify the benefits of the proposed DVC-PPG system. Results: The objective-evaluation metric of Google ASR showed theHighlights: Proposed a dysarthria voice conversion system that can effectively improve dysarthria speech intelligibility. The proposed system achieves higher speech intelligibility performance than the baseline system in challenging situations. The proposed system has achieved functional intelligibility for listeners. Abstract: Background and Objective: Most dysarthric patients encounter communication problems due to unintelligible speech. Currently, there are many voice-driven systems aimed at improving their speech intelligibility; however, the intelligibility performance of these systems are affected by challenging application conditions (e.g., time variance of patient's speech and background noise). To alleviate these problems, we proposed a dysarthria voice conversion (DVC) system for dysarthric patients and investigated the benefits under challenging application conditions. Method: A deep learning-based voice conversion system with phonetic posteriorgram (PPG) features, called the DVC-PPG system, was proposed in this study. An objective-evaluation metric of Google automatic speech recognition (Google ASR) system and a listening test were used to demonstrate the speech intelligibility benefits of DVC-PPG under quiet and noisy test conditions; besides, the well-known voice conversion system using mel-spectrogram, DVC-Mels, was used for comparison to verify the benefits of the proposed DVC-PPG system. Results: The objective-evaluation metric of Google ASR showed the average accuracy of two subjects in the duplicate and outside test conditions while the DVC-PPG system provided higher speech recognitions rate (83.2% and 67.5%) than dysarthric speech (36.5% and 26.9%) and DVC-Mels (52.9% and 33.8%) under quiet conditions. However, the DVC-PPG system provided more stable performance than the DVC-Mels under noisy test conditions. In addition, the results of the listening test showed that the speech-intelligibility performance of DVC-PPG was better than those obtained via the dysarthria speech and DVC-Mels under the duplicate and outside conditions, respectively. Conclusions: The objective-evaluation metric and listening test results showed that the recognition rate of the proposed DVC-PPG system was significantly higher than those obtained via the original dysarthric speech and DVC-Mels system. Therefore, it can be inferred from our study that the DVC-PPG system can improve the ability of dysarthric patients to communicate with people under challenging application conditions. … (more)
- Is Part Of:
- Computer methods and programs in biomedicine. Volume 215(2022)
- Journal:
- Computer methods and programs in biomedicine
- Issue:
- Volume 215(2022)
- Issue Display:
- Volume 215, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 215
- Issue:
- 2022
- Issue Sort Value:
- 2022-0215-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-03
- Subjects:
- Dysarthric patient -- Deep learning -- Voice conversion -- Phonetic posteriorgram
Medicine -- Computer programs -- Periodicals
Biology -- Computer programs -- Periodicals
Computers -- Periodicals
Medicine -- Periodicals
Médecine -- Logiciels -- Périodiques
Biologie -- Logiciels -- Périodiques
Biology -- Computer programs
Medicine -- Computer programs
Periodicals
Electronic journals
610.28 - Journal URLs:
- http://www.sciencedirect.com/science/journal/01692607 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cmpb.2021.106602 ↗
- Languages:
- English
- ISSNs:
- 0169-2607
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.095000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20850.xml