Learning visual and textual representations for multimodal matching and classification. (December 2018)
- Record Type:
- Journal Article
- Title:
- Learning visual and textual representations for multimodal matching and classification. (December 2018)
- Main Title:
- Learning visual and textual representations for multimodal matching and classification
- Authors:
- Liu, Yu
Liu, Li
Guo, Yanming
Lew, Michael S. - Abstract:
- Highlights: A unified network for image-text matching and classification. Seamlessly incorporating the matching and classification components. A multi-stage training algorithm by combining the matching and classification loss. Comprehensive study on the effectiveness of the proposed approach. Comparisons on four well-known multimodal benchmarks. Abstract: Multimodal learning has been an important and challenging problem for decades, which aims to bridge the modality gap between heterogeneous representations, such as vision and language. Unlike many current approaches which only focus on either multimodal matching or classification, we propose a unified network to jointly learn multimodal matching and classification (MMC-Net) between images and texts. The proposed MMC-Net model can seamlessly integrate the matching and classification components. It first learns visual and textual embedding features in the matching component, and then generates discriminative multimodal representations in the classification component. Combining the two components in a unified model can help in improving their performance. Moreover, we present a multi-stage training algorithm by minimizing both of the matching and classification loss functions. Experimental results on four well-known multimodal benchmarks demonstrate the effectiveness and efficiency of the proposed approach, which achieves competitive performance for multimodal matching and classification compared to state-of-the-art approaches.
- Is Part Of:
- Pattern recognition. Volume 84(2018:Dec.)
- Journal:
- Pattern recognition
- Issue:
- Volume 84(2018:Dec.)
- Issue Display:
- Volume 84 (2018)
- Year:
- 2018
- Volume:
- 84
- Issue Sort Value:
- 2018-0084-0000-0000
- Page Start:
- 51
- Page End:
- 67
- Publication Date:
- 2018-12
- Subjects:
- Vision and language -- Multimodal matching -- Multimodal classification -- Deep learning
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2018.07.001 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 16664.xml