A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry. Issue 2 (23rd February 2020)
- Record Type:
- Journal Article
- Title:
- A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry. Issue 2 (23rd February 2020)
- Main Title:
- A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry
- Authors:
- Hong, Liang
Hou, Wenjun
Wu, Zonghui
Han, Huijie - Abstract:
- Abstract : Purpose: The purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH). Design/methodology/approach: The proposed cooperative crowdsourcing framework (CCF) uses both human–computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge. Findings: The case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human–computer collaboration by considering the specialization of workers in different categories of tasks. Research limitations/implications: This research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts. Practical implications: The extracted knowledge is machine-understandable and can support the research of Tang poetry andAbstract : Purpose: The purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH). Design/methodology/approach: The proposed cooperative crowdsourcing framework (CCF) uses both human–computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge. Findings: The case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human–computer collaboration by considering the specialization of workers in different categories of tasks. Research limitations/implications: This research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts. Practical implications: The extracted knowledge is machine-understandable and can support the research of Tang poetry and knowledge-driven intelligent applications in DH. Originality/value: CCF is the first human-in-the-loop knowledge extraction framework that integrates active learning and crowdsourcing mechanisms; he human–computer cooperation method uses the feedback of domain experts through the active learning mechanism; the category-based crowdsourcing mechanism considers the matching of categories of DH data and especially of domain experts. … (more)
- Is Part Of:
- Aslib journal of information management. Volume 72:Issue 2(2020)
- Journal:
- Aslib journal of information management
- Issue:
- Volume 72:Issue 2(2020)
- Issue Display:
- Volume 72, Issue 2 (2020)
- Year:
- 2020
- Volume:
- 72
- Issue:
- 2
- Issue Sort Value:
- 2020-0072-0002-0000
- Page Start:
- 243
- Page End:
- 261
- Publication Date:
- 2020-02-23
- Subjects:
- Crowdsourcing -- Human–computer cooperation -- Knowledge extraction -- Digital humanities -- Tang poetry
Information science -- Periodicals
Library science -- Periodicals
020.5 - Journal URLs:
- http://www.emeraldinsight.com/journals.htm?issn=2050-3806 ↗
http://www.emeraldinsight.com/ ↗ - DOI:
- 10.1108/AJIM-07-2019-0192 ↗
- Languages:
- English
- ISSNs:
- 2050-3806
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13108.xml