A deep learning model for Ottoman OCR. (6th April 2022)
- Record Type:
- Journal Article
- Title:
- A deep learning model for Ottoman OCR. (6th April 2022)
- Main Title:
- A deep learning model for Ottoman OCR
- Authors:
- Dölek, İshak
Kurt, Atakan - Other Names:
- Wright Steven A. guestEditor.
Solak Serdar guestEditor.
Kilimci Zeynep Hilal guestEditor.
Eken Süleyman guestEditor.
Fernandes Steven guestEditor.
Zhang Yu‐Dong guestEditor.
Tavares João Manuel R.S. guestEditor. - Abstract:
- Abstract: The Ottoman OCR is an open problem because the OCR models for Arabic do not perform well on Ottoman. The models specifically trained with Ottoman documents have not produced satisfactory results either. We present a deep learning model and an OCR tool using that model for the OCR of printed Ottoman documents in the naksh font. We propose an end‐to‐end trainable CRNN architecture consisting of CNN, RNN (LSTM), and CTC layers for the Ottoman OCR problem. An experimental comparison of this model, called Osmanlica.com, with the Tesseract Arabic, the Tesseract Persian, Abby Finereader, Miletos, and Google Docs OCR tools or models was performed using a test data set of 21 pages of original documents. With 88.86% raw text, 96.12% normalized text, and 97.37% joined text character recognition accuracy, the Osmanlica.com Hybrid model outperforms the others with a marked difference. Our model outperforms the next best model by a clear margin of 4% which is a significant improvement considering the difficulty of the Ottoman OCR problem, and the huge size of the Ottoman archives to be processed. The hybrid model also achieves 58% word recognition accuracy on normalized text which is the only rate above 50%.
- Is Part Of:
- Concurrency and computation. Volume 34:Number 20(2022)
- Journal:
- Concurrency and computation
- Issue:
- Volume 34:Number 20(2022)
- Issue Display:
- Volume 34, Issue 20 (2022)
- Year:
- 2022
- Volume:
- 34
- Issue:
- 20
- Issue Sort Value:
- 2022-0034-0020-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2022-04-06
- Subjects:
- CNN -- CTC -- deep neural networks -- LSTM -- OCR -- Ottoman -- printed naksh font -- RNN
Parallel processing (Electronic computers) -- Periodicals
Parallel computers -- Periodicals
004.35 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
- DOI:
- 10.1002/cpe.6937 ↗
- Languages:
- English
- ISSNs:
- 1532-0626
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3405.622000
British Library DSC - BLDSS-3PM
British Library STI - ELD Digital store - Ingest File:
- 22986.xml