Statistical conversion of silent articulation into audible speech using full-covariance HMM. (March 2016)
- Record Type:
- Journal Article
- Title:
- Statistical conversion of silent articulation into audible speech using full-covariance HMM. (March 2016)
- Main Title:
- Statistical conversion of silent articulation into audible speech using full-covariance HMM
- Authors:
- Hueber, Thomas
Bailly, Gérard - Abstract:
- Highlights: Conversion of silent articulation captured by ultrasound and video to modal speech. Comparison of GMM and full-covariance phonetic HMM without vocabulary limitation. HMM-based approach allows the use of linguistic information for regularization. Objective evaluation showed a lower but more fluctuant spectral distortion for HMM. Perceptual evaluation showed a better intelligibility for HMM on consonants. Abstract: This article investigates the use of statistical mapping techniques for the conversion of articulatory movements into audible speech with no restriction on the vocabulary, in the context of a silent speech interface driven by ultrasound and video imaging. As a baseline, we first evaluated the GMM-based mapping considering dynamic features, proposed byToda et al. (2007) for voice conversion. Then, we proposed a 'phonetically-informed' version of this technique, based on full-covariance HMM . This approach aims (1) at modeling explicitly the articulatory timing for each phonetic class, and (2) at exploiting linguistic knowledge to regularize the problem of silent speech conversion. Both techniques were compared on continuous speech, for two French speakers (one male, one female). For modal speech, the HMM-based technique showed a lower spectral distortion (objective evaluation). However, perceptual tests (transcription and XAB discrimination tests) showed a better intelligibility of the GMM-based technique, probably related to its less fluctuant quality.Highlights: Conversion of silent articulation captured by ultrasound and video to modal speech. Comparison of GMM and full-covariance phonetic HMM without vocabulary limitation. HMM-based approach allows the use of linguistic information for regularization. Objective evaluation showed a lower but more fluctuant spectral distortion for HMM. Perceptual evaluation showed a better intelligibility for HMM on consonants. Abstract: This article investigates the use of statistical mapping techniques for the conversion of articulatory movements into audible speech with no restriction on the vocabulary, in the context of a silent speech interface driven by ultrasound and video imaging. As a baseline, we first evaluated the GMM-based mapping considering dynamic features, proposed byToda et al. (2007) for voice conversion. Then, we proposed a 'phonetically-informed' version of this technique, based on full-covariance HMM . This approach aims (1) at modeling explicitly the articulatory timing for each phonetic class, and (2) at exploiting linguistic knowledge to regularize the problem of silent speech conversion. Both techniques were compared on continuous speech, for two French speakers (one male, one female). For modal speech, the HMM-based technique showed a lower spectral distortion (objective evaluation). However, perceptual tests (transcription and XAB discrimination tests) showed a better intelligibility of the GMM-based technique, probably related to its less fluctuant quality. For silent speech, a perceptual identification test revealed a better segmental intelligibility for the HMM-based technique on consonants. … (more)
- Is Part Of:
- Computer speech & language. Volume 36(2016)
- Journal:
- Computer speech & language
- Issue:
- Volume 36(2016)
- Issue Display:
- Volume 36, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 36
- Issue:
- 2016
- Issue Sort Value:
- 2016-0036-2016-0000
- Page Start:
- 274
- Page End:
- 293
- Publication Date:
- 2016-03
- Subjects:
- Silent speech interface -- GMM -- HMM -- Ultrasound -- Articulatory–acoustic mapping
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2015.03.005 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 528.xml