Transfer learning for multimodal dialog. (November 2020)
- Record Type:
- Journal Article
- Title:
- Transfer learning for multimodal dialog. (November 2020)
- Main Title:
- Transfer learning for multimodal dialog
- Authors:
- Palaskar, Shruti
Sanabria, Ramon
Metze, Florian - Abstract:
- Abstract: Audio-Visual Scene-Aware Dialog (AVSD) is best understood as an extension of Visual Question Answering, the task of generating a textual answer in response to a textual question on multi-media content. In AVSD, the answer-relevant "context" is expanded to include past dialog turns, which we view as a specialized form of extra textual knowledge (in addition to the standard video features). We have developed a framework that uses hierarchical attention to fuse contributions from different modalities, and had shown how it can be used to generate textual summaries from multi-modal sources, specifically videos with accompanying commentary. In this paper, we transfer the algorithmic approach, models, and data from this background corpus of 2000 h of how-to videos to the AVSD task, and report our findings. Our approach uses dialog context, but makes no assumption about the ordering of the history. Our system achieves the best performance in both automatic and human evaluations in the 7th Dialog State Tracking Challenge (AVSD).
- Is Part Of:
- Computer speech & language. Volume 64(2020)
- Journal:
- Computer speech & language
- Issue:
- Volume 64(2020)
- Issue Display:
- Volume 64, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 64
- Issue:
- 2020
- Issue Sort Value:
- 2020-0064-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-11
- Subjects:
- Multimodal dialog -- Video question answering -- Transfer learning
Speech processing systems -- Periodicals
Automatic speech recognition -- Periodicals
Computers -- Periodicals
Linguistics -- Periodicals
Speech-Language Pathology -- Periodicals
Traitement automatique de la parole -- Périodiques
Reconnaissance automatique de la parole -- Périodiques
Automatic speech recognition
Speech processing systems
Electronic journals
Periodicals
006.454 - Journal URLs:
- http://www.journals.elsevier.com/computer-speech-and-language/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.csl.2020.101093 ↗
- Languages:
- English
- ISSNs:
- 0885-2308
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.276600
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13412.xml