Stochastic Data-to-Text Generation Using Syntactic Dependency Information. (November 2022)
- Record Type:
- Journal Article
- Title:
- Stochastic Data-to-Text Generation Using Syntactic Dependency Information. (November 2022)
- Main Title:
- Stochastic Data-to-Text Generation Using Syntactic Dependency Information
- Authors:
- Seifossadat, Elham
Sameti, Hossein - Abstract:
- Abstract: Data-to-Text Generation (D2T) is one of the most important sub-fields of Natural Language Generation where structured data is transcribed into natural language text. Several solutions have been proposed for D2T so far with relative success, including template-based, phrase structure grammar-based, and neural attention models. However, these methods also have problems such as grammatical flaws, limited naturalness, and semantic deficiencies. In this work, we propose a stochastic corpus-based model for the data-to-text generation that produces a tree-form structure for sentences based on dependency information. This information includes the dependency relations between words and meaning labels extracted from the aligned training sentences parsed with a dependency parser. By combining the dependency relations and meaning labels to construct a tree structure in an up-down manner, each word is placed into the output sentence based on its preceding and succeeding words. This results in fluent sentences with correct grammatical structures. This approach also ensures that all required semantic information are present in the output sentences while irrelevant or redundant labels are avoided. In addition, by using beam search in producing the structure of sentences, the proposed model can generate highly diverse sentences. We test our model on eight domains in tabular, dialogue act, and RDF formats. Our model improves the BLEU by 30% compared to the corpus-basedAbstract: Data-to-Text Generation (D2T) is one of the most important sub-fields of Natural Language Generation where structured data is transcribed into natural language text. Several solutions have been proposed for D2T so far with relative success, including template-based, phrase structure grammar-based, and neural attention models. However, these methods also have problems such as grammatical flaws, limited naturalness, and semantic deficiencies. In this work, we propose a stochastic corpus-based model for the data-to-text generation that produces a tree-form structure for sentences based on dependency information. This information includes the dependency relations between words and meaning labels extracted from the aligned training sentences parsed with a dependency parser. By combining the dependency relations and meaning labels to construct a tree structure in an up-down manner, each word is placed into the output sentence based on its preceding and succeeding words. This results in fluent sentences with correct grammatical structures. This approach also ensures that all required semantic information are present in the output sentences while irrelevant or redundant labels are avoided. In addition, by using beam search in producing the structure of sentences, the proposed model can generate highly diverse sentences. We test our model on eight domains in tabular, dialogue act, and RDF formats. Our model improves the BLEU by 30% compared to the corpus-based state-of-the-art methods trained on the tabular datasets and also achieves comparable results with the neural network-based approaches trained on dialogue act, E2E, and WebNLG datasets in the BLEU evaluation metric. Furthermore, the value of ERR metric for our results is always zero; that means our model generates sentences without losing any information. Human evaluations show that our model produces high-quality utterances in aspects of informativeness and naturalness as well as quality. … (more)
- Is Part Of:
- Computer speech & language. Volume 76(2022)
- Journal:
- Computer speech & language
- Issue:
- Volume 76(2022)
- Issue Display:
- Volume 76, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 76
- Issue:
- 2022
- Issue Sort Value:
- 2022-0076-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-11
- Subjects:
- Data-to-text Generation -- Natural Language Generation -- Syntactic Dependency
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101388 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 21757.xml