Text-instance graph: Exploring the relational semantics for text-based visual question answering. (April 2022)
- Record Type:
- Journal Article
- Title:
- Text-instance graph: Exploring the relational semantics for text-based visual question answering. (April 2022)
- Main Title:
- Text-instance graph: Exploring the relational semantics for text-based visual question answering
- Authors:
- Li, Xiangpeng
Wu, Bo
Song, Jingkuan
Gao, Lianli
Zeng, Pengpeng
Gan, Chuang - Abstract:
- Abstract: It is time to stop neglecting the text around your world. In VQA, the surrounding text helps humans to understand complete visual scenes and reason question semantics efficiently. Here, we address the challenging Text-based Visual Question Answering (TextVQA) problem, which requires a model to answer the VQA questions with text reading ability. Existing TextVQA methods mainly focus on the latent relationships between detected object instances and scene texts with the given question, but ignore spatial location relationships and complex relational semantics between visual object instances and OCR texts (e.g. the A of B on C). To deal with these challenges, we propose a novel Text-Instance Graph (TIG) network for TextVQA. The TIG builds an OCR-OBJ graph for overlapping relationships modeling, where each node of graph is updated by utilizing relative objects or OCR texts. To deal with the question with complex logic, we propose a dynamic OCR-OBJ graph network to extend the perception space of graph nodes, which grasps the information of non-directly adjacent node features. Considering a scene about "the brand of the computer on the table", the model would build correlations between "brand" and "table" using "the computer" node as the intermediate node. Extensive experiments on three benchmarks demonstrate the effectiveness and superiority of the proposed method. In addition, our TIG achieves 0.505 ANLS on ST-VQA challenge leaderboard and sets a new state-of-the-art.
- Is Part Of:
- Pattern recognition. Volume 124(2022)
- Journal:
- Pattern recognition
- Issue:
- Volume 124(2022)
- Issue Display:
- Volume 124, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 124
- Issue:
- 2022
- Issue Sort Value:
- 2022-0124-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-04
- Subjects:
- Text-based visual question answering -- Spatial overlapping -- Text-Instance graph -- Copy mechanism
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2021.108455 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22256.xml