A set of benchmarks for Handwritten Text Recognition on historical documents. (October 2019)
- Record Type:
- Journal Article
- Title:
- A set of benchmarks for Handwritten Text Recognition on historical documents. (October 2019)
- Main Title:
- A set of benchmarks for Handwritten Text Recognition on historical documents
- Authors:
- Sánchez, Joan Andreu
Romero, Verónica
Toselli, Alejandro H.
Villegas, Mauricio
Vidal, Enrique - Abstract:
- Highlights: Handwritten Text Recognition is researched in this paper with a set of free available benchmarks. Freely available tools are provided for Handwritten Text Recognition. Competitive results are provided with Convolutional Recurrent Neural Networks and N-gram language models. New challenges are described related to Handwritten Text Recognition. Abstract: Handwritten Text Recognition is a important requirement in order to make visible the contents of the myriads of historical documents residing in public and private archives and libraries world wide. Automatic Handwritten Text Recognition (HTR) is a challenging problem that requires a careful combination of several advanced Pattern Recognition techniques, including but not limited to Image Processing, Document Image Analysis, Feature Extraction, Neural Network approaches and Language Modeling. The progress of this kind of systems is strongly bound by the availability of adequate benchmarking datasets, software tools and reproducible results achieved using the corresponding tools and datasets. Based on English and German historical documents proposed in recent open competitions at ICDAR and ICFHR conferences between 2014 and 2017, this paper introduces four HTR benchmarks in order of increasing complexity from several points of view. For each benchmark, a specific system is proposed which overcomes results published so far under comparable conditions. Therefore, this paper establishes new state of the art baselineHighlights: Handwritten Text Recognition is researched in this paper with a set of free available benchmarks. Freely available tools are provided for Handwritten Text Recognition. Competitive results are provided with Convolutional Recurrent Neural Networks and N-gram language models. New challenges are described related to Handwritten Text Recognition. Abstract: Handwritten Text Recognition is a important requirement in order to make visible the contents of the myriads of historical documents residing in public and private archives and libraries world wide. Automatic Handwritten Text Recognition (HTR) is a challenging problem that requires a careful combination of several advanced Pattern Recognition techniques, including but not limited to Image Processing, Document Image Analysis, Feature Extraction, Neural Network approaches and Language Modeling. The progress of this kind of systems is strongly bound by the availability of adequate benchmarking datasets, software tools and reproducible results achieved using the corresponding tools and datasets. Based on English and German historical documents proposed in recent open competitions at ICDAR and ICFHR conferences between 2014 and 2017, this paper introduces four HTR benchmarks in order of increasing complexity from several points of view. For each benchmark, a specific system is proposed which overcomes results published so far under comparable conditions. Therefore, this paper establishes new state of the art baseline systems and results which aim at becoming new challenges that would hopefully drive further improvement of HTR technologies. Both the datasets and the software tools used to implement the baseline systems are made freely accessible for research purposes. … (more)
- Is Part Of:
- Pattern recognition. Volume 94(2019:Oct.)
- Journal:
- Pattern recognition
- Issue:
- Volume 94(2019:Oct.)
- Issue Display:
- Volume 94 (2019)
- Year:
- 2019
- Volume:
- 94
- Issue Sort Value:
- 2019-0094-0000-0000
- Page Start:
- 122
- Page End:
- 134
- Publication Date:
- 2019-10
- Subjects:
- Historical handwritten text recognition -- Hidden Markov models -- Convolutional neural networks -- Recurrent neural networks -- Language modeling
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2019.05.025 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10924.xml