BERT-hLSTMs: BERT and hierarchical LSTMs for visual storytelling. (May 2021)
- Record Type:
- Journal Article
- Title:
- BERT-hLSTMs: BERT and hierarchical LSTMs for visual storytelling. (May 2021)
- Main Title:
- BERT-hLSTMs: BERT and hierarchical LSTMs for visual storytelling
- Authors:
- Su, Jing
Dai, Qingyun
Guerin, Frank
Zhou, Mian - Abstract:
- Highlights: We propose a novel end-to-end BERT-dcLSTM framework to automatically generate coherent descriptions for sequential images. Firstly, we have exploited a pre-training (BERT) model to obtain sentence representation and word representation of the dataset which can efficiently enrich the meaning of sentences. Furthermore, a dual chain LSTM (dcLSTM) has been used to learn the relations between sequential images and corresponding descriptions and generate more coherent descriptions for sequential images. We have evaluated the performance of our proposed approach on the Sequential Image Narrative Dataset (SIND). Experimental results show that our proposed model outperforms the other closely related baselines under the automatic metrics BLEU and CIDEr, and can generate more consistent descriptions and efficiently learn the dependencies between images. Abstract: Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-based BERT to obtain embeddings for sentences and words. We then employ a hierarchical LSTM network: theHighlights: We propose a novel end-to-end BERT-dcLSTM framework to automatically generate coherent descriptions for sequential images. Firstly, we have exploited a pre-training (BERT) model to obtain sentence representation and word representation of the dataset which can efficiently enrich the meaning of sentences. Furthermore, a dual chain LSTM (dcLSTM) has been used to learn the relations between sequential images and corresponding descriptions and generate more coherent descriptions for sequential images. We have evaluated the performance of our proposed approach on the Sequential Image Narrative Dataset (SIND). Experimental results show that our proposed model outperforms the other closely related baselines under the automatic metrics BLEU and CIDEr, and can generate more consistent descriptions and efficiently learn the dependencies between images. Abstract: Visual storytelling is a creative and challenging task, aiming to automatically generate a story-like description for a sequence of images. The descriptions generated by previous visual storytelling approaches lack coherence because they use word-level sequence generation methods and do not adequately consider sentence-level dependencies. To tackle this problem, we propose a novel hierarchical visual storytelling framework which separately models sentence-level and word-level semantics. We use the transformer-based BERT to obtain embeddings for sentences and words. We then employ a hierarchical LSTM network: the bottom LSTM receives as input the sentence vector representation from BERT, to learn the dependencies between the sentences corresponding to images, and the top LSTM is responsible for generating the corresponding word vector representations, taking input from the bottom LSTM. Experimental results demonstrate that our model outperforms most closely related baselines under automatic evaluation metrics BLEU and CIDEr, and also show the effectiveness of our method with human evaluation. … (more)
- Is Part Of:
- Computer speech & language. Volume 67(2021)
- Journal:
- Computer speech & language
- Issue:
- Volume 67(2021)
- Issue Display:
- Volume 67, Issue 2021 (2021)
- Year:
- 2021
- Volume:
- 67
- Issue:
- 2021
- Issue Sort Value:
- 2021-0067-2021-0000
- Page Start:
- Page End:
- Publication Date:
- 2021-05
- Subjects:
- Visual storytelling -- BERT -- Hierarchical LSTMs -- Sentence vector
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101169 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 15407.xml