Arabic machine reading comprehension on the Holy Qur'an using CL-AraBERT. Issue 6 (November 2022)

Record Type:: Journal Article
Title:: Arabic machine reading comprehension on the Holy Qur'an using CL-AraBERT. Issue 6 (November 2022)
Main Title:: Arabic machine reading comprehension on the Holy Qur'an using CL-AraBERT
Authors:: Malhas, Rana
Elsayed, Tamer
Abstract:: Abstract: In this work, we tackle the problem of machine reading comprehension (MRC) on the Holy Qur'an to address the lack of Arabic datasets and systems for this important task. We construct QRCD as the first Qur'anic Reading Comprehension Dataset, composed of 1, 337 question-passage-answer triplets for 1, 093 question-passage pairs, of which 14% are multi-answer questions. We then introduce CLassical-AraBERT (CL-AraBERT for short), a new AraBERT-based pre-trained model, which is further pre-trained on about 1.0B-word Classical Arabic (CA) dataset, to complement the Modern Standard Arabic (MSA) resources used in pre-training the initial model, and make it a better fit for the task. Finally, we leverage cross-lingual transfer learning from MSA to CA, and fine-tune CL-AraBERT as a reader using two MSA-based MRC datasets followed by our QRCD dataset to constitute the first (to the best of our knowledge) MRC system on the Holy Qur'an. To evaluate our system, we introduce Partial Average Precision ( p A P ) as an adapted version of the traditional rank-based Average Precision measure, which integrates partial matching in the evaluation over multi-answer and single-answer MSA questions. Adopting two experimental evaluation setups (hold-out and cross validation (CV)), we empirically show that the fine-tuned CL-AraBERT reader model significantly outperforms the baseline fine-tuned AraBERT reader model by 6.12 and 3.75 points in p A P scores, in the hold-out and CV setups, … (more)
Is Part Of:: Information processing & management. Volume 59:Issue 6(2022)
Journal:: Information processing & management
Issue:: Volume 59:Issue 6(2022)
Issue Display:: Volume 59, Issue 6 (2022)
Year:: 2022
Volume:: 59
Issue:: 6
Issue Sort Value:: 2022-0059-0006-0000
Page Start:
Page End:
Publication Date:: 2022-11
Subjects:: Classical Arabic -- Reading comprehension -- Answer extraction -- Partial matching evaluation -- Pre-trained language models -- Cross-lingual transfer learning
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038
Journal URLs:: http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.ipm.2022.103068 ↗
Languages:: English
ISSNs:: 0306-4573
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 24125.xml