Out of vocabulary word detection and recovery in Arabic handwritten text recognition. (September 2019)
- Record Type:
- Journal Article
- Title:
- Out of vocabulary word detection and recovery in Arabic handwritten text recognition. (September 2019)
- Main Title:
- Out of vocabulary word detection and recovery in Arabic handwritten text recognition
- Authors:
- Jemni, Sana Khamekhem
Kessentini, Yousri
Kanoun, Slim - Abstract:
- Highlights: A novel two-step OOV words detection and recovery method is proposed. The proposed method is generic and independent of the recognition engine. The proposed method uses various sub-lexical modeling to improve the detection step. The recovery process relies on dynamic lexicons built from large text corpora. The proposed method significantly improves the recognition results. Abstract: Today's Arabic Handwriting recognition systems are able to recognize arbitrary words over a large but finite vocabulary. Systems operating with a fixed vocabulary are bound to encounter so-called out-of-vocabulary (OOV) words. The aim of this research is to propose a two-step approach that tackles the problem of OOV words in Arabic handwriting. In the first step, we exploit different types of sub-word units to detect the potential OOVs. In the recovery stage, a dynamic dictionary is built to extend the initial static word lexicon in order to cope with the detected OOVs. The recovery includes a selection step in which the best word candidates extracted from the external resource are kept. Experiments were conducted on the public benchmarking KHATT and AHTID/MW databases. The obtained results revealed that sub-word modeling could give cues for improving the detection and that the use of a dynamic dictionary significantly improves the recognition performance compared to one-step approaches that are based on a large static dictionary or the combination of different sub-word units. WeHighlights: A novel two-step OOV words detection and recovery method is proposed. The proposed method is generic and independent of the recognition engine. The proposed method uses various sub-lexical modeling to improve the detection step. The recovery process relies on dynamic lexicons built from large text corpora. The proposed method significantly improves the recognition results. Abstract: Today's Arabic Handwriting recognition systems are able to recognize arbitrary words over a large but finite vocabulary. Systems operating with a fixed vocabulary are bound to encounter so-called out-of-vocabulary (OOV) words. The aim of this research is to propose a two-step approach that tackles the problem of OOV words in Arabic handwriting. In the first step, we exploit different types of sub-word units to detect the potential OOVs. In the recovery stage, a dynamic dictionary is built to extend the initial static word lexicon in order to cope with the detected OOVs. The recovery includes a selection step in which the best word candidates extracted from the external resource are kept. Experiments were conducted on the public benchmarking KHATT and AHTID/MW databases. The obtained results revealed that sub-word modeling could give cues for improving the detection and that the use of a dynamic dictionary significantly improves the recognition performance compared to one-step approaches that are based on a large static dictionary or the combination of different sub-word units. We achieve the state of the art results on the KHATT dataset. … (more)
- Is Part Of:
- Pattern recognition. Volume 93(2019:Sep.)
- Journal:
- Pattern recognition
- Issue:
- Volume 93(2019:Sep.)
- Issue Display:
- Volume 93 (2019)
- Year:
- 2019
- Volume:
- 93
- Issue Sort Value:
- 2019-0093-0000-0000
- Page Start:
- 507
- Page End:
- 520
- Publication Date:
- 2019-09
- Subjects:
- Arabic Handwriting recognition -- Out of vocabulary detection and recovery -- Static lexicon -- Dynamic lexicon -- Statistical language model -- Deep learning -- Multi-dimensional long short term memory network
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2019.05.003 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22198.xml