Learning visual and textual representations for multimodal matching and classification. (December 2018)

Record Type:: Journal Article
Title:: Learning visual and textual representations for multimodal matching and classification. (December 2018)
Main Title:: Learning visual and textual representations for multimodal matching and classification
Authors:: Liu, Yu
Liu, Li
Guo, Yanming
Lew, Michael S.
Abstract:: Highlights: A unified network for image-text matching and classification. Seamlessly incorporating the matching and classification components. A multi-stage training algorithm by combining the matching and classification loss. Comprehensive study on the effectiveness of the proposed approach. Comparisons on four well-known multimodal benchmarks. Abstract: Multimodal learning has been an important and challenging problem for decades, which aims to bridge the modality gap between heterogeneous representations, such as vision and language. Unlike many current approaches which only focus on either multimodal matching or classification, we propose a unified network to jointly learn multimodal matching and classification (MMC-Net) between images and texts. The proposed MMC-Net model can seamlessly integrate the matching and classification components. It first learns visual and textual embedding features in the matching component, and then generates discriminative multimodal representations in the classification component. Combining the two components in a unified model can help in improving their performance. Moreover, we present a multi-stage training algorithm by minimizing both of the matching and classification loss functions. Experimental results on four well-known multimodal benchmarks demonstrate the effectiveness and efficiency of the proposed approach, which achieves competitive performance for multimodal matching and classification compared to state-of-the-art approaches.
Is Part Of:: Pattern recognition. Volume 84(2018:Dec.)
Journal:: Pattern recognition
Issue:: Volume 84(2018:Dec.)
Issue Display:: Volume 84 (2018)
Year:: 2018
Volume:: 84
Issue Sort Value:: 2018-0084-0000-0000
Page Start:: 51
Page End:: 67
Publication Date:: 2018-12
Subjects:: Vision and language -- Multimodal matching -- Multimodal classification -- Deep learning
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4
Journal URLs:: http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗
DOI:: 10.1016/j.patcog.2018.07.001 ↗
Languages:: English
ISSNs:: 0031-3203
Deposit Type:: Legaldeposit
View Content:: Available online (eLD content is only available in our Reading Rooms) ↗
Physical Locations:: British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store
Ingest File:: 16664.xml