Script identification in the wild via discriminative convolutional neural network. (April 2016)
- Record Type:
- Journal Article
- Title:
- Script identification in the wild via discriminative convolutional neural network. (April 2016)
- Main Title:
- Script identification in the wild via discriminative convolutional neural network
- Authors:
- Shi, Baoguang
Bai, Xiang
Yao, Cong - Abstract:
- Abstract: Script identification facilitates many important applications in document/video analysis. This paper investigates a relatively new problem: identifying scripts in natural images. The basic idea is combining deep features and mid-level representations into a globally trainable deep model. Specifically, a set of deep feature maps is firstly extracted by a pre-trained CNN model from the input images, where the local deep features are densely collected. Then, discriminative clustering is performed to learn a set of discriminative patterns based on such local features. A mid-level representation is obtained by encoding the local features based on the learned discriminative patterns (codebook). Finally, the mid-level representations and the deep features are jointly optimized in a deep network. Benefiting from such a fine-grained classification strategy, the optimized deep model, termed Discriminative Convolutional Neural Network (DisCNN), is capable of effectively revealing the subtle differences among the scripts difficult to be distinguished, e.g. Chinese and Japanese. In addition, a large scale dataset containing 16, 291 in-the-wild text images in 13 scripts, namely SIW-13, is created for evaluation. Our method is not limited to identifying text images, and performs effectively on video and document scripts as well, not requiring any preprocess like binarization, segmentation or hand-crafted features. The experimental comparisons on the datasets including SIW-13,Abstract: Script identification facilitates many important applications in document/video analysis. This paper investigates a relatively new problem: identifying scripts in natural images. The basic idea is combining deep features and mid-level representations into a globally trainable deep model. Specifically, a set of deep feature maps is firstly extracted by a pre-trained CNN model from the input images, where the local deep features are densely collected. Then, discriminative clustering is performed to learn a set of discriminative patterns based on such local features. A mid-level representation is obtained by encoding the local features based on the learned discriminative patterns (codebook). Finally, the mid-level representations and the deep features are jointly optimized in a deep network. Benefiting from such a fine-grained classification strategy, the optimized deep model, termed Discriminative Convolutional Neural Network (DisCNN), is capable of effectively revealing the subtle differences among the scripts difficult to be distinguished, e.g. Chinese and Japanese. In addition, a large scale dataset containing 16, 291 in-the-wild text images in 13 scripts, namely SIW-13, is created for evaluation. Our method is not limited to identifying text images, and performs effectively on video and document scripts as well, not requiring any preprocess like binarization, segmentation or hand-crafted features. The experimental comparisons on the datasets including SIW-13, CVSI-2015 and Multi-Script consistently demonstrate DisCNN a state-of-the-art approach for script identification. Abstract : Highlights: We study a new and important topic: script identification in scene text images. The proposed DiscCNN combines deep features and the mid-level representation. DiscCNN learns special characteristics of scripts from training data automatically. DiscCNN achieves state-of-the-art performances on scene, video and document scripts. A large-scale in-the-wild script identification dataset is proposed. … (more)
- Is Part Of:
- Pattern recognition. Volume 52(2016:Apr.)
- Journal:
- Pattern recognition
- Issue:
- Volume 52(2016:Apr.)
- Issue Display:
- Volume 52 (2016)
- Year:
- 2016
- Volume:
- 52
- Issue Sort Value:
- 2016-0052-0000-0000
- Page Start:
- 448
- Page End:
- 458
- Publication Date:
- 2016-04
- Subjects:
- Script identification -- Convolutional neural network -- Mid-level representation -- Discriminative clustering -- Dataset
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2015.11.005 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 1075.xml