TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis. (April 2023)
- Record Type:
- Journal Article
- Title:
- TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis. (April 2023)
- Main Title:
- TETFN: A text enhanced transformer fusion network for multimodal sentiment analysis
- Authors:
- Wang, Di
Guo, Xutong
Tian, Yumin
Liu, Jinhui
He, LiHuo
Luo, Xuemei - Abstract:
- Highlights: A novel multimodal sentiment analysis network TETFN is proposed. It adds sentiment-related information to non-linguistic modalities by text-based multi-head attention. It simultaneously learns the consistency and differentiated information of different modalities. ViT is used to extract visual features, which contains both local and global information. Extensive experimental results demonstrate its superior performance. Graphical abstract: Abstract: Multimodal sentiment analysis (MSA), which aims to recognize sentiment expressed by speakers in videos utilizing textual, visual and acoustic cues, has attracted extensive research attention in recent years. However, textual, visual and acoustic modalities often contribute differently to sentiment analysis. In general, text contains more intuitive sentiment-related information and outperforms nonlinguistic modalities in MSA. Seeking a strategy to take advantage of this property to obtain a fusion representation containing more sentiment-related information and simultaneously preserving inter- and intra-modality relationships becomes a significant challenge. To this end, we propose a novel method named Text Enhanced Transformer Fusion Network (TETFN), which learns text-oriented pairwise cross-modal mappings for obtaining effective unified multimodal representations. In particular, it incorporates textual information in learning sentiment-related nonlinguistic representations through text-based multi-head attention. InHighlights: A novel multimodal sentiment analysis network TETFN is proposed. It adds sentiment-related information to non-linguistic modalities by text-based multi-head attention. It simultaneously learns the consistency and differentiated information of different modalities. ViT is used to extract visual features, which contains both local and global information. Extensive experimental results demonstrate its superior performance. Graphical abstract: Abstract: Multimodal sentiment analysis (MSA), which aims to recognize sentiment expressed by speakers in videos utilizing textual, visual and acoustic cues, has attracted extensive research attention in recent years. However, textual, visual and acoustic modalities often contribute differently to sentiment analysis. In general, text contains more intuitive sentiment-related information and outperforms nonlinguistic modalities in MSA. Seeking a strategy to take advantage of this property to obtain a fusion representation containing more sentiment-related information and simultaneously preserving inter- and intra-modality relationships becomes a significant challenge. To this end, we propose a novel method named Text Enhanced Transformer Fusion Network (TETFN), which learns text-oriented pairwise cross-modal mappings for obtaining effective unified multimodal representations. In particular, it incorporates textual information in learning sentiment-related nonlinguistic representations through text-based multi-head attention. In addition to preserving consistency information by cross-modal mappings, it also retains the differentiated information among modalities through unimodal label prediction. Furthermore, the vision pre-trained model Vision-Transformer is utilized to extract visual features from the original videos to preserve both global and local information of a human face. Extensive experiments on benchmark datasets CMU-MOSI and CMU-MOSEI demonstrate the superior performance of the proposed TETFN over state-of-the-art methods. … (more)
- Is Part Of:
- Pattern recognition. Volume 136(2023)
- Journal:
- Pattern recognition
- Issue:
- Volume 136(2023)
- Issue Display:
- Volume 136, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 136
- Issue:
- 2023
- Issue Sort Value:
- 2023-0136-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-04
- Subjects:
- Multimodal sentiment analysis -- Transformer -- Text-oriented pairwise cross-modal mappings
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2022.109259 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25681.xml