Extracting structured data from publications in the Art Conservation Domain. (12th February 2014)
- Record Type:
- Journal Article
- Title:
- Extracting structured data from publications in the Art Conservation Domain. (12th February 2014)
- Main Title:
- Extracting structured data from publications in the Art Conservation Domain
- Authors:
- Odat, Suleiman
Groza, Tudor
Hunter, Jane - Abstract:
- Abstract: The most common method of publishing new discoveries about art conservation techniques and research has been through traditional full-text publications. Such corpora typically only support searching via metadata (e.g. title, authors, or keywords) and full-text. In particular, it is difficult to discover valuable information about the chemical processes, experimental results, or preservation treatments associated with the conservation of paintings from a specific genre. This article addresses this problem by focusing on the extraction of structured data (that complies with a pre-defined ontology) from a distributed corpus of publications about painting conservation. Our specific extraction method involves a unique combination of named entity recognition (using gazetteer-based and machine learning-based methods) followed by relationship extraction (using rule-based and machine learning-based methods). The resulting structured data are stored in a resource description framework triple store, and a Web-based graphical user interface enables the SPARQL querying, retrieval, and display of the search results. The results from applying our techniques to a corpus of publications on art conservation indicate that our approach achieves higher quality precision and recall in extracting named entities and relations from publications, relative to alternative existing approaches.
- Is Part Of:
- Digital scholarship in the humanties. Volume 30:Number 2(2015)
- Journal:
- Digital scholarship in the humanties
- Issue:
- Volume 30:Number 2(2015)
- Issue Display:
- Volume 30, Issue 2 (2015)
- Year:
- 2015
- Volume:
- 30
- Issue:
- 2
- Issue Sort Value:
- 2015-0030-0002-0000
- Page Start:
- 225
- Page End:
- 245
- Publication Date:
- 2014-02-12
- Subjects:
- Philology -- Data processing -- Periodicals
Computational linguistics -- Periodicals
410.285 - Journal URLs:
- http://www.oxfordjournals.org/ ↗
http://dsh.oxfordjournals.org/ ↗ - DOI:
- 10.1093/llc/fqu002 ↗
- Languages:
- English
- ISSNs:
- 2055-768X
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25394.xml