CycleMatch: A cycle-consistent embedding network for image-text matching. (September 2019)
- Record Type:
- Journal Article
- Title:
- CycleMatch: A cycle-consistent embedding network for image-text matching. (September 2019)
- Main Title:
- CycleMatch: A cycle-consistent embedding network for image-text matching
- Authors:
- Liu, Yu
Guo, Yanming
Liu, Li
Bakker, Erwin M.
Lew, Michael S. - Abstract:
- Highlights: A novel deep cycle-consistent embedding network (CycleMatch) for image-text matching. Maintain both inter-modal correlations and intra-modal consistency. Late-fusion approaches to efficiently integrate the matching scores of multiple embedding features. Competitive with the state-of-the-art performance on two well-known multimodal benchmarks. Abstract: In numerous multimedia and multi-modal tasks from image and video retrieval to zero-shot recognition to multimedia question and answering, bridging image and text representations plays an important and in some cases an indispensable role. To narrow the modality gap between vision and language, prior approaches attempt to discover their correlated semantics in a common feature space. However, these approaches omit the intra-modal semantic consistency when learning the inter-modal correlations. To address this problem, we propose cycle-consistent embeddings in a deep neural network for matching visual and textual representations. Our approach named as CycleMatch can maintain both inter-modal correlations and intra-modal consistency by cascading dual mappings and reconstructed mappings in a cyclic fashion. Moreover, in order to achieve a robust inference, we propose to employ two late-fusion approaches: average fusion and adaptive fusion. Both of them can effectively integrate the matching scores of different embedding features, without increasing the network complexity and training time. In the experiments onHighlights: A novel deep cycle-consistent embedding network (CycleMatch) for image-text matching. Maintain both inter-modal correlations and intra-modal consistency. Late-fusion approaches to efficiently integrate the matching scores of multiple embedding features. Competitive with the state-of-the-art performance on two well-known multimodal benchmarks. Abstract: In numerous multimedia and multi-modal tasks from image and video retrieval to zero-shot recognition to multimedia question and answering, bridging image and text representations plays an important and in some cases an indispensable role. To narrow the modality gap between vision and language, prior approaches attempt to discover their correlated semantics in a common feature space. However, these approaches omit the intra-modal semantic consistency when learning the inter-modal correlations. To address this problem, we propose cycle-consistent embeddings in a deep neural network for matching visual and textual representations. Our approach named as CycleMatch can maintain both inter-modal correlations and intra-modal consistency by cascading dual mappings and reconstructed mappings in a cyclic fashion. Moreover, in order to achieve a robust inference, we propose to employ two late-fusion approaches: average fusion and adaptive fusion. Both of them can effectively integrate the matching scores of different embedding features, without increasing the network complexity and training time. In the experiments on cross-modal retrieval, we demonstrate comprehensive results to verify the effectiveness of the proposed approach. Our approach achieves state-of-the-art performance on two well-known multi-modal datasets, Flickr30K and MSCOCO. … (more)
- Is Part Of:
- Pattern recognition. Volume 93(2019:Sep.)
- Journal:
- Pattern recognition
- Issue:
- Volume 93(2019:Sep.)
- Issue Display:
- Volume 93 (2019)
- Year:
- 2019
- Volume:
- 93
- Issue Sort Value:
- 2019-0093-0000-0000
- Page Start:
- 365
- Page End:
- 379
- Publication Date:
- 2019-09
- Subjects:
- Image-text matching -- Embedding -- Deep neural networks -- Late-fusion inference
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2019.05.008 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 22198.xml