HVLM: Exploring Human-Like Visual Cognition and Language-Memory Network for Visual Dialog. Issue 5 (September 2022)

Record Type:: Journal Article
Title:: HVLM: Exploring Human-Like Visual Cognition and Language-Memory Network for Visual Dialog. Issue 5 (September 2022)
Main Title:: HVLM: Exploring Human-Like Visual Cognition and Language-Memory Network for Visual Dialog
Authors:: Sun, Kaili
Guo, Chi
Zhang, Huyin
Li, Yuan
Abstract:: Abstract: Visual dialog, a visual-language task, enables an AI agent to engage in conversation with humans grounded in a given image. To generate appropriate answers for a series of questions in the dialog, the agent is required to understand the comprehensive visual content of an image and the fine-grained textual context of the dialog. However, previous studies typically utilized the object-level visual feature to represent a whole image, which only focuses on the local perspective of an image but ignores the importance of the global information in an image. In this paper, we proposed a novel model Human-Like Visual Cognitive and Language-Memory Network for Visual Dialog (HVLM), to simulate global and local dual-perspective cognitions in the human visual system and understand an image comprehensively. HVLM consists of two key modules, Local-to-Global Graph Convolutional Visual Cognition (LG-GCVC) and Question-guided Language Topic Memory (T-Mem). Specifically, in the LG-GCVC module, we design a question-guided dual-perspective reasoning to jointly learn visual contents from both local and global perspectives through a simple spectral graph convolution network. Furthermore, in the T-Mem module, we design an iterative learning strategy to gradually enhance fine-grained textual context details via an attention mechanism. Experimental results demonstrate the superiority of our proposed model, which obtains the comparable performance on benchmark datasets VisDial v1.0 and … (more)
Is Part Of:: Information processing & management. Volume 59:Issue 5(2022)
Journal:: Information processing & management
Issue:: Volume 59:Issue 5(2022)
Issue Display:: Volume 59, Issue 5 (2022)
Year:: 2022
Volume:: 59
Issue:: 5
Issue Sort Value:: 2022-0059-0005-0000
Page Start:
Page End:
Publication Date:: 2022-09
Subjects:: Visual Dialog -- Visual-language understanding -- Dual-perspective reasoning -- Simple spectral graph convolution network
Information storage and retrieval systems -- Periodicals
Information science -- Periodicals
Systèmes d'information -- Périodiques
Sciences de l'information -- Périodiques
Information science
Information storage and retrieval systems
Periodicals
658.4038
Journal URLs:: http://www.sciencedirect.com/science/journal/03064573 ↗
http://www.elsevier.com/journals ↗
DOI:: 10.1016/j.ipm.2022.103008 ↗
Languages:: English
ISSNs:: 0306-4573
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - 4493.893000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 23283.xml