Dynamic self-attention with vision synchronization networks for video question answering. (December 2022)

Record Type:: Journal Article
Title:: Dynamic self-attention with vision synchronization networks for video question answering. (December 2022)
Main Title:: Dynamic self-attention with vision synchronization networks for video question answering
Authors:: Liu, Yun
Zhang, Xiaoming
Huang, Feiran
Shen, Shixun
Tian, Peng
Li, Lang
Li, Zhoujun
Abstract:: Highlights: A novel token selection mechanism based on the dynamic self-attention network is proposed to automatically extract important video features. A vision synchronization network is proposed to align appearance and motion features at the time slice level. Extensive experiments and analysis confirm the superiority of the proposed model DSAVS. Abstract: Video Question Answering (VideoQA) has gained increasing attention as an important task in understanding the rich spatio-temporal contents, i.e., the appearance and motion in the video. However, existing approaches mainly use the question to learn attentions over all the sampled appearance and motion features separately, which neglect two properties of VideoQA: (1) the answer to the question is often reflected on a few frames and video clips, and most video contents are superfluous; (2) appearance and motion features are usually concomitant and complementary to each other in time series. In this paper, we propose a novel VideoQA model, i.e., Dynamic Self-Attention with Vision Synchronization Networks (DSAVS), to address these problems. Specifically, a gated token selection mechanism is proposed to dynamically select the important tokens from appearance and motion sequences. These chosen tokens are fed into a self-attention mechanism to model the internal dependencies for more effective representation learning. To capture the correlation between the appearance and motion features, a vision synchronization block is … (more)
Is Part Of:: Pattern recognition. Volume 132(2022)
Journal:: Pattern recognition
Issue:: Volume 132(2022)
Issue Display:: Volume 132, Issue 2022 (2022)
Year:: 2022
Volume:: 132
Issue:: 2022
Issue Sort Value:: 2022-0132-2022-0000
Page Start:
Page End:
Publication Date:: 2022-12
Subjects:: Video question answering -- Dynamic self-attention -- Vision synchronization
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4
Journal URLs:: http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗
DOI:: 10.1016/j.patcog.2022.108959 ↗
Languages:: English
ISSNs:: 0031-3203
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 23281.xml