Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering. (July 2016)
- Record Type:
- Journal Article
- Title:
- Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering. (July 2016)
- Main Title:
- Concept-to-Speech generation with knowledge sharing for acoustic modelling and utterance filtering
- Authors:
- Wang, Xin
Ling, Zhen-Hua
Dai, Li-Rong - Abstract:
- Abstract : Highlights: We present two knowledge sharing approaches for CTS generation. Syntactic features replace prosody phrasing features in HMM-based speech synthesis. Acoustic features are used to filter synthetic utterances for one input concept. The HMM-based acoustic model yields comparable results without prosodic phrasing. Utterance filtering can remove inferior synthetic utterances for the input concept. Abstract: A Concept-to-Speech (CTS) system converts the conceptual representation of a sentence-to-be-spoken into speech. While some CTS systems consist of independently built text generation and Text-to-Speech (TTS) modules, the majority of the existing CTS systems enhance the connection between these two modules with a prosodic prediction module that utilizes linguistic knowledge from the text generator to predict prosodic features for TTS generation. However, knowledge embodied within the individual modules has the potential to be shared in more ways. This paper describes knowledge sharing for acoustic modelling and utterance filtering in a Mandarin CTS system. First, syntactic information generated by the text generator is propagated to a hidden Markov model (HMM) based acoustic model within the TTS module and replaces the symbolic prosodic phrasing features therein. Our experimental results show that this approach alleviates the local hard-decision problem in automatic prosodic phrasing for Mandarin CTS systems and achieves a comparable performance to theAbstract : Highlights: We present two knowledge sharing approaches for CTS generation. Syntactic features replace prosody phrasing features in HMM-based speech synthesis. Acoustic features are used to filter synthetic utterances for one input concept. The HMM-based acoustic model yields comparable results without prosodic phrasing. Utterance filtering can remove inferior synthetic utterances for the input concept. Abstract: A Concept-to-Speech (CTS) system converts the conceptual representation of a sentence-to-be-spoken into speech. While some CTS systems consist of independently built text generation and Text-to-Speech (TTS) modules, the majority of the existing CTS systems enhance the connection between these two modules with a prosodic prediction module that utilizes linguistic knowledge from the text generator to predict prosodic features for TTS generation. However, knowledge embodied within the individual modules has the potential to be shared in more ways. This paper describes knowledge sharing for acoustic modelling and utterance filtering in a Mandarin CTS system. First, syntactic information generated by the text generator is propagated to a hidden Markov model (HMM) based acoustic model within the TTS module and replaces the symbolic prosodic phrasing features therein. Our experimental results show that this approach alleviates the local hard-decision problem in automatic prosodic phrasing for Mandarin CTS systems and achieves a comparable performance to the traditional approach without explicit prosodic phrasing. Second, the acoustic features of multiple synthetic utterances expressing the same input concept are utilized to evaluate the utterance candidates. With this 'post-processing' mechanism, our CTS system is able to filter out inferior synthetic utterances and find an acceptable candidate to express the input concept. … (more)
- Is Part Of:
- Computer speech & language. Volume 38(2016)
- Journal:
- Computer speech & language
- Issue:
- Volume 38(2016)
- Issue Display:
- Volume 38, Issue 2016 (2016)
- Year:
- 2016
- Volume:
- 38
- Issue:
- 2016
- Issue Sort Value:
- 2016-0038-2016-0000
- Page Start:
- 46
- Page End:
- 67
- Publication Date:
- 2016-07
- Subjects:
- Concept-to-Speech -- Speech synthesis -- Hidden Markov model -- Natural language generation
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2015.12.003 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 2379.xml