PQuAD: A Persian question answering dataset. (May 2023)
- Record Type:
- Journal Article
- Title:
- PQuAD: A Persian question answering dataset. (May 2023)
- Main Title:
- PQuAD: A Persian question answering dataset
- Authors:
- Darvishi, Kasra
Shahbodaghkhan, Newsha
Abbasiantaeb, Zahra
Momtazi, Saeedeh - Abstract:
- Abstract: We present the Persian Question Answering Dataset (PQuAD), a crowdsourced reading comprehension dataset on Persian Wikipedia articles. It includes 80, 000 questions along with their answers, with 25% of the questions being adversarially unanswerable. We examine various properties of the dataset to show the diversity and the level of its difficulty as a MRC benchmark. By releasing this dataset, we aim to ease research on Persian reading comprehension and the development of Persian question answering systems. Our experiments on different state-of-the-art pre-trained contextualized language models show 74.8% Exact Match (EM) and 87.6% F1-score that can be used as the baseline results for further research on Persian QA. Highlights: Providing a large dataset for Persian machine reading comprehension Analyzing various attributes of dataset in order to indicate the degree of difficulty Presenting baseline results using state-of-the-art transformer-based models Providing human performance estimation on the dataset along with their analysis
- Is Part Of:
- Computer speech & language. Volume 80(2023)
- Journal:
- Computer speech & language
- Issue:
- Volume 80(2023)
- Issue Display:
- Volume 80, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 80
- Issue:
- 2023
- Issue Sort Value:
- 2023-0080-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-05
- Subjects:
- Machine reading comprehension -- Natural language processing -- Persian dataset -- Question answering
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2023.101486 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 26173.xml