Rethinking referring relationships from a perspective of mask-level relational reasoning. (January 2023)

Record Type:: Journal Article
Title:: Rethinking referring relationships from a perspective of mask-level relational reasoning. (January 2023)
Main Title:: Rethinking referring relationships from a perspective of mask-level relational reasoning
Authors:: Li, Chengyang
Zhu, Liping
Tian, Gangyi
Hou, Yi
Zhou, Heng
Abstract:: Highlights: We rethink RR task from the perspective of Mask-level Relational Reasoning. It makes the proposed method more explanatory and extensible. We design two modules: Mask Generate and Mask Transfer. They jointly help the model learn more language priors and multimodal information. We introduce an image-to-text relational reasoning module, which is unsupervised. It improves the generalization ability of the multimodal model. Our method achieves state-of-the-art accuracy on two challenging datasets, VRD and Visual Genome. Abstract: Referring relationship aims at localizing subject and object entities in an image, according to a triple text < subject, predicate, object > . Previous methods use iterative attention to shift between image regions for modeling predicate. However, predicate sometimes is implicit and difficult to be represented in the image domain. Convolution modeling method to express predicate is simple and inappropriate. Besides, relational reasoning information in the text itself is not fully utilized. To this end, we rethink referring relationship from a mask-level relational reasoning perspective to improve model interpretability. For text-to-image reasoning, we design Mask Generate and Mask Transfer modules, so as to fully integrate the text priors into the reasoning and prediction of masks. For image-to-text reasoning, we propose an unsupervised triple reconstruction method to guide text-to-image reasoning and improve multimodal generalization. By … (more)
Is Part Of:: Pattern recognition. Volume 133(2023)
Journal:: Pattern recognition
Issue:: Volume 133(2023)
Issue Display:: Volume 133, Issue 2023 (2023)
Year:: 2023
Volume:: 133
Issue:: 2023
Issue Sort Value:: 2023-0133-2023-0000
Page Start:
Page End:
Publication Date:: 2023-01
Subjects:: Referring relationship -- Multimodal learning -- Image and text -- Visual grounding -- Deep learning
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4
Journal URLs:: http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗
DOI:: 10.1016/j.patcog.2022.109044 ↗
Languages:: English
ISSNs:: 0031-3203
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 24024.xml