PSI: Patch-based script identification using non-negative matrix factorization. (July 2017)
- Record Type:
- Journal Article
- Title:
- PSI: Patch-based script identification using non-negative matrix factorization. (July 2017)
- Main Title:
- PSI: Patch-based script identification using non-negative matrix factorization
- Authors:
- Arabnejad, Ehsan
Farrahi Moghaddam, Reza
Cheriet, Mohamed - Abstract:
- Highlights: A novel method for script identification of ancient manuscript is proposed. Image patches are selected and extracted as lowest level of information. This level of information allows for robust representation against noise and at the same time captures local properties of objects. Non-Negative Matrix factorization is used for learning of features that perform better than hand designed features. The proposed method is versatile and can be applied on different level of layouts. Abstract: Script identification is an important step in automatic understanding of ancient manuscripts because there is no universal script-independent understanding tool available. Unlike the machine-printed and modern documents, ancient manuscripts are highly-unconstrained in structure and layout and suffer from various types of degradation and noise. These challenges make automatic script identification of ancient manuscripts a difficult task. In this paper, a novel method for script identification of ancient manuscripts is proposed which uses a representation of images by a set of overlapping patches, and considers the patches as the lowest unit of representation (objects). Non-Negative Matrix Factorization (NMF), motivated by the structure of the patches and the non-negative nature of images, is used as feature extraction method to create low-dimensional representation for the patches and also to learn a dictionary. This dictionary will be used to project all of the patches to aHighlights: A novel method for script identification of ancient manuscript is proposed. Image patches are selected and extracted as lowest level of information. This level of information allows for robust representation against noise and at the same time captures local properties of objects. Non-Negative Matrix factorization is used for learning of features that perform better than hand designed features. The proposed method is versatile and can be applied on different level of layouts. Abstract: Script identification is an important step in automatic understanding of ancient manuscripts because there is no universal script-independent understanding tool available. Unlike the machine-printed and modern documents, ancient manuscripts are highly-unconstrained in structure and layout and suffer from various types of degradation and noise. These challenges make automatic script identification of ancient manuscripts a difficult task. In this paper, a novel method for script identification of ancient manuscripts is proposed which uses a representation of images by a set of overlapping patches, and considers the patches as the lowest unit of representation (objects). Non-Negative Matrix Factorization (NMF), motivated by the structure of the patches and the non-negative nature of images, is used as feature extraction method to create low-dimensional representation for the patches and also to learn a dictionary. This dictionary will be used to project all of the patches to a low-dimensional space. A second dictionary is learned using the K-means algorithm for the purpose of speeding up the algorithm. These two dictionaries are used for classification of new data. The proposed method is robust with respect to degradation and needs less normalization. The performance and reliability of the proposed method have been evaluated against state-of-the-art methods on an ancient manuscripts dataset with promising results. … (more)
- Is Part Of:
- Pattern recognition. Volume 67(2017:Jul.)
- Journal:
- Pattern recognition
- Issue:
- Volume 67(2017:Jul.)
- Issue Display:
- Volume 67 (2017)
- Year:
- 2017
- Volume:
- 67
- Issue Sort Value:
- 2017-0067-0000-0000
- Page Start:
- 328
- Page End:
- 339
- Publication Date:
- 2017-07
- Subjects:
- Script identification -- Non-negative matrix factorization -- Patch representation -- Clustering
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2017.02.020 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1166.xml