A benchmark dataset for Turkish data-to-text generation. (January 2023)
- Record Type:
- Journal Article
- Title:
- A benchmark dataset for Turkish data-to-text generation. (January 2023)
- Main Title:
- A benchmark dataset for Turkish data-to-text generation
- Authors:
- Demir, Seniz
Oktem, Seza - Abstract:
- Abstract: In the last decades, data-to-text (D2T) systems that directly learn from data have gained a lot of attention in natural language generation. These systems need data with high quality and large volume, but unfortunately some natural languages suffer from the lack of readily available generation datasets. This article describes our efforts to create a new Turkish dataset (Tr-D2T) that consists of meaning representation and reference sentence pairs without fine-grained word alignments. We utilize Turkish web resources and existing datasets in other languages for producing meaning representations and collect reference sentences by crowdsourcing native speakers. We particularly focus on the generation of single-sentence biographies and dining venue descriptions. In order to motivate future Turkish D2T studies, we present detailed benchmarking results of different sequence-to-sequence neural models trained on this dataset. To the best of our knowledge, this work is the first of its kind that provides preliminary findings and lessons learned from the creation of a new Turkish D2T dataset. Moreover, our work is the first extensive study that presents generation performances of transformer and recurrent neural network models from meaning representations in this morphologically-rich language.
- Is Part Of:
- Computer speech & language. Volume 77(2023)
- Journal:
- Computer speech & language
- Issue:
- Volume 77(2023)
- Issue Display:
- Volume 77, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 77
- Issue:
- 2023
- Issue Sort Value:
- 2023-0077-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-01
- Subjects:
- Data-to-text generation -- Neural Models -- Biography domain -- Dining venue domain -- Turkish -- Crowdsourcing
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2022.101433 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23321.xml