Speech understanding for spoken dialogue systems: From corpus harvesting to grammar rule induction. (January 2018)
- Record Type:
- Journal Article
- Title:
- Speech understanding for spoken dialogue systems: From corpus harvesting to grammar rule induction. (January 2018)
- Main Title:
- Speech understanding for spoken dialogue systems: From corpus harvesting to grammar rule induction
- Authors:
- Iosif, Elias
Klasinas, Ioannis
Athanasopoulou, Georgia
Palogiannidi, Elisavet
Georgiladakis, Spiros
Louka, Katerina
Potamianos, Alexandros - Abstract:
- Highlights: We investigate algorithms for inducing grammars for spoken dialogue systems. Main tasks: creation of text corpora; induction of low- and high-level grammars. The proposed algorithms and features are portable across languages and domains. Different features should be applied for low- and high-level grammar rules. Web data harvesting is a plausible approach for corpora creation. Abstract: We investigate algorithms and tools for the semi-automatic authoring of grammars for spoken dialogue systems (SDS) proposing a framework that spans from corpora creation to grammar induction algorithms. A realistic human-in-the-loop approach is followed balancing automation and human intervention to optimize cost to performance ratio for grammar development. Web harvesting is the main approach investigated for eliciting spoken dialogue textual data, while crowdsourcing is also proposed as an alternative method. Several techniques are presented for constructing web queries and filtering the acquired corpora. We also investigate how the harvested corpora can be used for the automatic and semi-automatic (human-in-the-loop) induction of grammar rules. SDS grammar rules and induction algorithms are grouped into two types, namely, low- and high-level. Two families of algorithms are investigated for rule induction: one based on semantic similarity and distributional semantic models, and the other using more traditional statistical modeling approaches (e.g., slot-filling algorithms usingHighlights: We investigate algorithms for inducing grammars for spoken dialogue systems. Main tasks: creation of text corpora; induction of low- and high-level grammars. The proposed algorithms and features are portable across languages and domains. Different features should be applied for low- and high-level grammar rules. Web data harvesting is a plausible approach for corpora creation. Abstract: We investigate algorithms and tools for the semi-automatic authoring of grammars for spoken dialogue systems (SDS) proposing a framework that spans from corpora creation to grammar induction algorithms. A realistic human-in-the-loop approach is followed balancing automation and human intervention to optimize cost to performance ratio for grammar development. Web harvesting is the main approach investigated for eliciting spoken dialogue textual data, while crowdsourcing is also proposed as an alternative method. Several techniques are presented for constructing web queries and filtering the acquired corpora. We also investigate how the harvested corpora can be used for the automatic and semi-automatic (human-in-the-loop) induction of grammar rules. SDS grammar rules and induction algorithms are grouped into two types, namely, low- and high-level. Two families of algorithms are investigated for rule induction: one based on semantic similarity and distributional semantic models, and the other using more traditional statistical modeling approaches (e.g., slot-filling algorithms using Conditional Random Fields). Evaluation results are presented for two domains and languages. High-level induction precision scores up to 60% are obtained. Results advocate the portability of the proposed features and algorithms across languages and domains. … (more)
- Is Part Of:
- Computer speech & language. Volume 47(2018)
- Journal:
- Computer speech & language
- Issue:
- Volume 47(2018)
- Issue Display:
- Volume 47, Issue 2018 (2018)
- Year:
- 2018
- Volume:
- 47
- Issue:
- 2018
- Issue Sort Value:
- 2018-0047-2018-0000
- Page Start:
- 272
- Page End:
- 297
- Publication Date:
- 2018-01
- Subjects:
- Spoken dialogue systems -- Grammar induction -- Corpora creation -- Semantic similarity -- Web mining -- Crowdsourcing
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2017.08.002 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 20786.xml