TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering. (22nd February 2023)
- Record Type:
- Journal Article
- Title:
- TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering. (22nd February 2023)
- Main Title:
- TASTA: Text‐Assisted Spatial and Temporal Attention Network for Video Question Answering
- Authors:
- Wang, Tian
Hou, Boyao
Li, Jiakun
Shi, Peng
Zhang, Baochang
Snoussi, Hichem - Abstract:
- Abstract : Video question answering (VideoQA) is a typical task that integrates language and vision. The key for VideoQA is to extract relevant and effective visual information for answering a specific question. Information selection is believed to be necessary for this task due to the large amount of irrelevant information in the video, and explicitly learning an attention model can be a reasonable and effective solution for the selection. Herein, a novel VideoQA model called Text‐Assisted Spatial and Temporal Attention Network (TASTA) is proposed, which shows the great potential of explicitly modeling attention. TASTA is made to be simple, small, clean, and efficient for clear performance justification and possible easy extension. Its success is mainly from two new strategies of better using the textual information. Experimental results on a large and most representative dataset, TGIF‐QA, show the significant superiority of TASTA w.r.t. the state‐of‐the‐art and demonstrate the effectiveness of its key components via ablation studies. Abstract : This study proposes Text‐Assisted Spatial and Temporal Attention Network (TASTA) for video question answering. Carefully designed textual guidance from question–answer pairs and attentively fusing of spatial and temporal information contribute to the outstanding performance. Model effectivity and generality are verified by ablation studies. Extensive use in intelligent manufacturing for querying system status in natural languageAbstract : Video question answering (VideoQA) is a typical task that integrates language and vision. The key for VideoQA is to extract relevant and effective visual information for answering a specific question. Information selection is believed to be necessary for this task due to the large amount of irrelevant information in the video, and explicitly learning an attention model can be a reasonable and effective solution for the selection. Herein, a novel VideoQA model called Text‐Assisted Spatial and Temporal Attention Network (TASTA) is proposed, which shows the great potential of explicitly modeling attention. TASTA is made to be simple, small, clean, and efficient for clear performance justification and possible easy extension. Its success is mainly from two new strategies of better using the textual information. Experimental results on a large and most representative dataset, TGIF‐QA, show the significant superiority of TASTA w.r.t. the state‐of‐the‐art and demonstrate the effectiveness of its key components via ablation studies. Abstract : This study proposes Text‐Assisted Spatial and Temporal Attention Network (TASTA) for video question answering. Carefully designed textual guidance from question–answer pairs and attentively fusing of spatial and temporal information contribute to the outstanding performance. Model effectivity and generality are verified by ablation studies. Extensive use in intelligent manufacturing for querying system status in natural language based on videos is promising. … (more)
- Is Part Of:
- Advanced intelligent systems. Volume 5:Number 4(2023)
- Journal:
- Advanced intelligent systems
- Issue:
- Volume 5:Number 4(2023)
- Issue Display:
- Volume 5, Issue 4 (2023)
- Year:
- 2023
- Volume:
- 5
- Issue:
- 4
- Issue Sort Value:
- 2023-0005-0004-0000
- Page Start:
- n/a
- Page End:
- n/a
- Publication Date:
- 2023-02-22
- Subjects:
- attention mechanism -- video question answering -- visual question answering
Artificial intelligence -- Periodicals
Robotics -- Periodicals
Control theory -- Periodicals
006.3 - Journal URLs:
- http://onlinelibrary.wiley.com/ ↗
https://onlinelibrary.wiley.com/journal/26404567 ↗ - DOI:
- 10.1002/aisy.202200131 ↗
- Languages:
- English
- ISSNs:
- 2640-4567
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 27021.xml