Generation of creaky voice for improving the quality of HMM-based speech synthesis. (March 2017)
- Record Type:
- Journal Article
- Title:
- Generation of creaky voice for improving the quality of HMM-based speech synthesis. (March 2017)
- Main Title:
- Generation of creaky voice for improving the quality of HMM-based speech synthesis
- Authors:
- Narendra, N.P.
Sreenivasa Rao, K. - Abstract:
- Highlights: An HMM-based speech synthesis system capable of generating creaky voice is developed. Two main issues involved in the synthesis of creaky voice are addressed. A creaky voice detection method is proposed based on the analysis of variation of epoch parameters for different voicing regions. A hybrid source model which is an extension of recently developed time-domain deterministic plus noise model is proposed for modelling creaky excitation signal. Abstract: This paper aims at developing an HMM-based speech synthesis system capable of generating creaky voice in addition to modal voice. Generation of creaky voice is carried out by addressing two main issues, namely, an automatic prediction of creaky voice and appropriate modelling of the excitation signal of creaky voice. An automatic creaky voice detection method is proposed based on the analysis of variation of epoch parameters for different voicing regions. A neural network classifier is trained using the variances of epoch parameters for detection of creaky regions. A hybrid source model which is an extension of recently developed time-domain deterministic plus noise model is proposed for modelling creaky excitation signal. In the proposed hybrid source model, the pitch-synchronous analysis is performed on the creaky excitation signal of every phone. From the creaky residual frames of every phonetic class, the deterministic and noise components are estimated. The creaky deterministic components of all phoneticHighlights: An HMM-based speech synthesis system capable of generating creaky voice is developed. Two main issues involved in the synthesis of creaky voice are addressed. A creaky voice detection method is proposed based on the analysis of variation of epoch parameters for different voicing regions. A hybrid source model which is an extension of recently developed time-domain deterministic plus noise model is proposed for modelling creaky excitation signal. Abstract: This paper aims at developing an HMM-based speech synthesis system capable of generating creaky voice in addition to modal voice. Generation of creaky voice is carried out by addressing two main issues, namely, an automatic prediction of creaky voice and appropriate modelling of the excitation signal of creaky voice. An automatic creaky voice detection method is proposed based on the analysis of variation of epoch parameters for different voicing regions. A neural network classifier is trained using the variances of epoch parameters for detection of creaky regions. A hybrid source model which is an extension of recently developed time-domain deterministic plus noise model is proposed for modelling creaky excitation signal. In the proposed hybrid source model, the pitch-synchronous analysis is performed on the creaky excitation signal of every phone. From the creaky residual frames of every phonetic class, the deterministic and noise components are estimated. The creaky deterministic components of all phonetic classes are stored in the database. The noise components are parameterized in terms of spectral and amplitude envelopes and are modelled by HMMs. During synthesis, the appropriate deterministic component is selected from the database, and the noise component is constructed from the parameters generated from HMMs. The creaky deterministic and noise components are pitch-synchronously overlap-added to produce the creaky excitation signal. Subjective evaluation results indicate that the incorporation of creaky voice has improved the naturalness of the synthetic speech of two male speakers, and the quality is slightly better than the basic time-domain deterministic plus noise model meant for only modal excitation. … (more)
- Is Part Of:
- Computer speech & language. Volume 42(2017)
- Journal:
- Computer speech & language
- Issue:
- Volume 42(2017)
- Issue Display:
- Volume 42, Issue 2017 (2017)
- Year:
- 2017
- Volume:
- 42
- Issue:
- 2017
- Issue Sort Value:
- 2017-0042-2017-0000
- Page Start:
- 38
- Page End:
- 58
- Publication Date:
- 2017-03
- Subjects:
- HMM-based speech synthesis -- Detection of creaky voice -- Zero-frequency filtering -- Epoch parameters -- Synthesis of creaky voice -- Deterministic plus noise model -- Hybrid source modelling
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2016.08.002 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 704.xml