Text baseline detection, a single page trained system. (October 2019)
- Record Type:
- Journal Article
- Title:
- Text baseline detection, a single page trained system. (October 2019)
- Main Title:
- Text baseline detection, a single page trained system
- Authors:
- Pastor, Moisés
- Abstract:
- Highlights: A fast and robust to nosily manuscripts, text baseline detection method is proposed. The local minima of the text contours are considered as interest points. An Extremely Randomized Trees forest is used to classify the interest points. A modified version of Dbscan is used to cluster these points into baselines. Roughly, 3 s on average to automatically estimate the baselines of each page. Abstract: Nowadays, there are a lot of page images available and the scanning process is quite well resolved and can be done industrially. On the other hand, HTR systems can only deal with single text line images. Segmenting pages into single text line images is a very expensive process which has traditionally been done manually. This is a bottleneck which is holding back any massive industrial document processing. A baseline detection method will be presented here 1 . The initial problem is reformulated as a clustering problem over a set of interest points. Its design aim is to be fast and to resist the noise artifacts that usually appear in historical manuscripts: variable interline spacing, the overlapping and touching of words in adjacent lines, humidity spots, etc. Results show that this system can be used to massively detect where the text lines are in pages. Highlight: This system reached second place in theIcdar 2017 Competition on Baseline Detection (seeTable 1 ).
- Is Part Of:
- Pattern recognition. Volume 94(2019:Oct.)
- Journal:
- Pattern recognition
- Issue:
- Volume 94(2019:Oct.)
- Issue Display:
- Volume 94 (2019)
- Year:
- 2019
- Volume:
- 94
- Issue Sort Value:
- 2019-0094-0000-0000
- Page Start:
- 149
- Page End:
- 161
- Publication Date:
- 2019-10
- Subjects:
- Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2019.05.031 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 10924.xml