Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition. (December 2019)
- Record Type:
- Journal Article
- Title:
- Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition. (December 2019)
- Main Title:
- Compressing CNN-DBLSTM models for OCR with teacher-student learning and Tucker decomposition
- Authors:
- Ding, Haisong
Chen, Kai
Huo, Qiang - Abstract:
- Highlights: We investigate teacher-student learning and Tucker decomposition to compress and accelerate convolutional layers within CTC-trained CNN-DBLSTM models for OCR. To the best of our knowledge, we are the first to address this problem. Based on the architecture of CNN-DBLSTM model, we propose an objective function for teacher-student learning that directly matches the feature sequences extracted by CNNs of teacher and student models under the guidance of the succeeding LSTM layers. Experimental results on large scale handwritten and printed OCR tasks show that student model trained with the proposed criterion outperforms that trained with a standard KL divergence criterion. We explore the effectiveness of combining teacher-student learning and Tucker decomposition. We use teacher-student learning to transfer the knowledge of a large-size teacher model to a small-size compact student model, followed by Tucker decomposition for further compression and acceleration. Our results show that we can build a very compact CNN-DBLSTM model by using this method, which can reduce significantly both the footprint and computation cost without or with a small recognition accuracy degradation. Abstract: Integrated convolutional neural network (CNN) and deep bidirectional long short-term memory (DBLSTM) based character models have achieved excellent recognition accuracies on optical character recognition (OCR) tasks, along with large amount of model parameters and massive computationHighlights: We investigate teacher-student learning and Tucker decomposition to compress and accelerate convolutional layers within CTC-trained CNN-DBLSTM models for OCR. To the best of our knowledge, we are the first to address this problem. Based on the architecture of CNN-DBLSTM model, we propose an objective function for teacher-student learning that directly matches the feature sequences extracted by CNNs of teacher and student models under the guidance of the succeeding LSTM layers. Experimental results on large scale handwritten and printed OCR tasks show that student model trained with the proposed criterion outperforms that trained with a standard KL divergence criterion. We explore the effectiveness of combining teacher-student learning and Tucker decomposition. We use teacher-student learning to transfer the knowledge of a large-size teacher model to a small-size compact student model, followed by Tucker decomposition for further compression and acceleration. Our results show that we can build a very compact CNN-DBLSTM model by using this method, which can reduce significantly both the footprint and computation cost without or with a small recognition accuracy degradation. Abstract: Integrated convolutional neural network (CNN) and deep bidirectional long short-term memory (DBLSTM) based character models have achieved excellent recognition accuracies on optical character recognition (OCR) tasks, along with large amount of model parameters and massive computation cost. To deploy CNN-DBLSTM model in products withCPU server, there is an urgent need to compress and accelerate it as much as possible, especially the CNN part, which dominates both parameters and computation. In this paper, we study teacher-student learning and Tucker decomposition methods to reduce model size and runtime latency for CNN-DBLSTM based character model for OCR. We use teacher-student learning to transfer the knowledge of a large-size teacher model to a small-size compact student model, followed by Tucker decomposition to further compress the student model. For teacher-student learning, we design a novel learning criterion to bring in the guidance of succeeding LSTM layer when matching the CNN-extracted feature sequences of the large teacher and small student models. Experimental results on large scale handwritten and printed OCR tasks show that, using teacher-student learning alone achieves9.90 × footprint reduction and15.23 × inference speedup yet without degrading recognition accuracy. Combined with Tucker decomposition method, we can compress and accelerate the model further. The decomposed model achieves11.89 × footprint reduction and22.16 × inference speedup while suffering no or only a small recognition accuracy degradation against the large-size baseline model. … (more)
- Is Part Of:
- Pattern recognition. Volume 96(2019:Dec.)
- Journal:
- Pattern recognition
- Issue:
- Volume 96(2019:Dec.)
- Issue Display:
- Volume 96 (2019)
- Year:
- 2019
- Volume:
- 96
- Issue Sort Value:
- 2019-0096-0000-0000
- Page Start:
- Page End:
- Publication Date:
- 2019-12
- Subjects:
- Optical character recognition -- CNN-DBLSTM Character model -- Model compression -- Teacher-student learning -- Tucker decomposition
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2019.07.002 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11627.xml