Image caption generation with dual attention mechanism. Issue 2 (March 2020)
- Record Type:
- Journal Article
- Title:
- Image caption generation with dual attention mechanism. Issue 2 (March 2020)
- Main Title:
- Image caption generation with dual attention mechanism
- Authors:
- Liu, Maofu
Li, Lingjun
Hu, Huijun
Guan, Weili
Tian, Jing - Abstract:
- Highlights: We combine visual attention and textual attention to forma dual attention mechanism to guide the image caption generation. We adopt FCN to predict image tagsand fuse tag generation and image caption generation to train encode-decode model. Our proposed model achieves state-of-the-artperformance in AIC-ICC image Chinese caption dataset. Abstract: As a crossing domain of computer vision and natural language processing, the image caption generation has been an active research topic in recent years, which contributes to the multimodal social media translation from unstructured image data to structured text data. The conventional research works have proposed a series of image captioning methods, such as template-based, retrieval-based, encode-decode. Among these methods, the one with encode-decode framework is widely used in the image caption generation, in which the encoder extracts the image features by Convolutional Neural Network (CNN), and the decoder adopts Recurrent Neural Network (RNN) to generate the image description. The Neural Image Caption (NIC) model has achieved good performance in image captioning, and however, there still remains some challenges to be addressed. To tackle the challenges of the lack of image information and the deviation from the core content of the image, our proposed model explores visual attention to deepen the understanding of the image, incorporating the image labels generated by Fully Convolutional Network (FCN) into theHighlights: We combine visual attention and textual attention to forma dual attention mechanism to guide the image caption generation. We adopt FCN to predict image tagsand fuse tag generation and image caption generation to train encode-decode model. Our proposed model achieves state-of-the-artperformance in AIC-ICC image Chinese caption dataset. Abstract: As a crossing domain of computer vision and natural language processing, the image caption generation has been an active research topic in recent years, which contributes to the multimodal social media translation from unstructured image data to structured text data. The conventional research works have proposed a series of image captioning methods, such as template-based, retrieval-based, encode-decode. Among these methods, the one with encode-decode framework is widely used in the image caption generation, in which the encoder extracts the image features by Convolutional Neural Network (CNN), and the decoder adopts Recurrent Neural Network (RNN) to generate the image description. The Neural Image Caption (NIC) model has achieved good performance in image captioning, and however, there still remains some challenges to be addressed. To tackle the challenges of the lack of image information and the deviation from the core content of the image, our proposed model explores visual attention to deepen the understanding of the image, incorporating the image labels generated by Fully Convolutional Network (FCN) into the generation of image caption. Furthermore, our proposed model exploits textual attention to increase the integrity of the information. Finally, the label generation, attached to the textual attention mechanism, and the image caption generation, have been merged to form an end-to-end trainable framework. In this paper, extensive experiments have been carried out on the AIC-ICC image caption benchmark dataset, and the experimental results show that our proposed model is effective and feasible in the image caption generation. … (more)
- Is Part Of:
- Information processing & management. Volume 57:Issue 2(2020:Mar.)
- Journal:
- Information processing & management
- Issue:
- Volume 57:Issue 2(2020:Mar.)
- Issue Display:
- Volume 57, Issue 2 (2020)
- Year:
- 2020
- Volume:
- 57
- Issue:
- 2
- Issue Sort Value:
- 2020-0057-0002-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-03
- Subjects:
- Image caption generation -- Textual attention -- Visual attention -- Dual attention -- Fully convolutional network
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038 - Journal URLs:
- http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.ipm.2019.102178 ↗
- Languages:
- English
- ISSNs:
- 0306-4573
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12552.xml