Generating image descriptions with multidirectional 2D long short‐term memory. Issue 1 (7th November 2016)
- Record Type:
- Journal Article
- Title:
- Generating image descriptions with multidirectional 2D long short‐term memory. Issue 1 (7th November 2016)
- Main Title:
- Generating image descriptions with multidirectional 2D long short‐term memory
- Authors:
- Li, Shuohao
Zhang, Jun
Guo, Qiang
Lei, Jun
Tu, Dan - Abstract:
- Abstract : Connecting visual imagery with descriptive language is a challenge for computer vision and machine translation. To approach this problem, the authors propose a novel end‐to‐end model to generate descriptions for images. Some early works used convolutional neural network‐long‐short‐term memory (CNN‐LSTM) model to describe the image, where a CNN encodes the input image into feature vector and an LSTM decodes the feature vector into a description. Since two‐dimensional LSTM (2DLSTM) has property of translation invariance and can encode the relationships between regions in an image, they not only apply a CNN to extract global features of an image, but also use a multidirectional 2DLSTM to encode the feature maps extracted by CNN into structural local features. Their model is trained through maximising the likelihood of the target description sentence from the training dataset. Experiments on two challenging datasets show the accuracy of the model and the fluency of the language which is learned by their model. They compare bilingual evaluation understudy score and retrieval metric of their results with current state‐of‐the‐art scores and show the improvements on Flickr30k and MS COCO.
- Is Part Of:
- IET computer vision. Volume 11:Issue 1(2017)
- Journal:
- IET computer vision
- Issue:
- Volume 11:Issue 1(2017)
- Issue Display:
- Volume 11, Issue 1 (2017)
- Year:
- 2017
- Volume:
- 11
- Issue:
- 1
- Issue Sort Value:
- 2017-0011-0001-0000
- Page Start:
- 104
- Page End:
- 111
- Publication Date:
- 2016-11-07
- Subjects:
- computer vision -- maximum likelihood estimation -- feature extraction -- neural nets
image descriptions -- multidirectional 2D long-short-term memory -- visual imagery -- computer vision -- machine translation -- convolutional neural network-long-short-term memory model -- CNN-LSTM model -- feature vector -- two-dimensional LSTM -- translation invariance -- multidirectional 2DLSTM -- feature map extraction -- structural local features
Computer vision -- Periodicals
Pattern recognition systems -- Periodicals
006.37 - Journal URLs:
- http://digital-library.theiet.org/content/journals/iet-cvi ↗
http://www.ietdl.org/IET-CVI ↗
https://ietresearch.onlinelibrary.wiley.com/journal/17519640 ↗
http://www.theiet.org/ ↗ - DOI:
- 10.1049/iet-cvi.2015.0473 ↗
- Languages:
- English
- ISSNs:
- 1751-9632
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 4363.252250
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16690.xml