Automatic metadata extraction via image processing using Migne's Patrologia Graeca. (25th May 2021)
- Record Type:
- Journal Article
- Title:
- Automatic metadata extraction via image processing using Migne's Patrologia Graeca. (25th May 2021)
- Main Title:
- Automatic metadata extraction via image processing using Migne's Patrologia Graeca
- Authors:
- Varthis, Evagelos
Poulos, Marios
Giarenis, Ilias
Papavlasopoulos, Sozon - Abstract:
- A wealth of knowledge is kept in libraries and cultural institutions in various digital forms without, however, the possibility of a simple term search, let alone of a substantial semantic search. In this study, a novel approach is proposed which strives to recognise words and automatically generate metadata from large machine-printed corpora such as Migne's Patrologia Graeca (PG). The proposed framework firstly applies an efficient word segmentation and then transforms the word-images into special compact shapes. For the comparison, we use Hu's invariant moments for discarding unlikely similar matches, Shape Context (SC) for the contour similarity and the Pearson's Correlation Coefficient (PCC) for final verification. Comparative results are presented by using the Long-Short Term Memory (LSTM) Neural Network (NN) engine of Tesseract Optical Character Recognition (OCR) system instead of PCC. In addition, an intelligent scenario is proposed for automatic generation of PG metadata by librarians.
- Is Part Of:
- International journal of metadata, semantics and ontologies. Volume 14:Number 4(2020)
- Journal:
- International journal of metadata, semantics and ontologies
- Issue:
- Volume 14:Number 4(2020)
- Issue Display:
- Volume 14, Issue 4 (2020)
- Year:
- 2020
- Volume:
- 14
- Issue:
- 4
- Issue Sort Value:
- 2020-0014-0004-0000
- Page Start:
- 265
- Page End:
- 278
- Publication Date:
- 2021-05-25
- Subjects:
- Patrologia Graeca -- word spotting -- shape context -- time series -- metadata extraction -- semantic enrichment -- digital librarian
Metadata -- Periodicals
Semantic Web -- Periodicals
Ontologies (Information retrieval) -- Periodicals
Data structures (Computer science) -- Periodicals
Information theory -- Periodicals
005.74 - Journal URLs:
- http://www.inderscience.com/browse/index.php?journalID=152 ↗
http://www.inderscience.com/ ↗ - Languages:
- English
- ISSNs:
- 1744-2621
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 15634.xml