A uniform transformer-based structure for feature fusion and enhancement for RGB-D saliency detection. (August 2023)
- Record Type:
- Journal Article
- Title:
- A uniform transformer-based structure for feature fusion and enhancement for RGB-D saliency detection. (August 2023)
- Main Title:
- A uniform transformer-based structure for feature fusion and enhancement for RGB-D saliency detection
- Authors:
- Wang, Yue
Jia, Xu
Zhang, Lu
Li, Yuke
Elder, James H.
Lu, Huchuan - Abstract:
- Highlights: We design a novel transformer-based framework for RGB-D saliency detection which simultaneously and globally integrates features across modalities and scales. The proposed RGB-D saliency detector uses only transformers as a uniform operation for both feature fusion and feature enhancement, which shows the potential of transformer in this task and simplifies the model design The proposed algorithm generally performs favorably against most state-of-the-art RGB-D saliency detection methods. Meanwhile, the proposed model is efficient for having relatively smaller FLOPs and model size compared with other methods. Abstract: RGB-D saliency detection integrates information from both RGB images and depth maps to improve the prediction of salient regions under challenging conditions. The key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities. Previous approaches tend to apply the multi-scale and multi-modal fusion separately via local operations, which fails to capture long-range dependencies. Here we propose a transformer-based structure to address this issue. The proposed architecture is composed of two modules: an Intra-modality Feature Enhancement Module (IFEM) and an Inter-modality Feature Fusion Module (IFFM). IFFM conducts a sufficient feature fusion by integrating features from multiple scales and two modalities over all positions simultaneously. IFEM enhances feature on each scale by selecting andHighlights: We design a novel transformer-based framework for RGB-D saliency detection which simultaneously and globally integrates features across modalities and scales. The proposed RGB-D saliency detector uses only transformers as a uniform operation for both feature fusion and feature enhancement, which shows the potential of transformer in this task and simplifies the model design The proposed algorithm generally performs favorably against most state-of-the-art RGB-D saliency detection methods. Meanwhile, the proposed model is efficient for having relatively smaller FLOPs and model size compared with other methods. Abstract: RGB-D saliency detection integrates information from both RGB images and depth maps to improve the prediction of salient regions under challenging conditions. The key to RGB-D saliency detection is to fully mine and fuse information at multiple scales across the two modalities. Previous approaches tend to apply the multi-scale and multi-modal fusion separately via local operations, which fails to capture long-range dependencies. Here we propose a transformer-based structure to address this issue. The proposed architecture is composed of two modules: an Intra-modality Feature Enhancement Module (IFEM) and an Inter-modality Feature Fusion Module (IFFM). IFFM conducts a sufficient feature fusion by integrating features from multiple scales and two modalities over all positions simultaneously. IFEM enhances feature on each scale by selecting and integrating complementary information from other scales within the same modality before IFFM. We show that transformer is a uniform operation which presents great efficacy in both feature fusion and feature enhancement, and simplifies the model design. Extensive experimental results on five benchmark datasets demonstrate that our proposed network performs favorably against most state-of-the-art RGB-D saliency detection methods. Furthermore, our model is efficient for having relatively smaller FLOPs and model size compared with other methods. … (more)
- Is Part Of:
- Pattern recognition. Volume 140(2023)
- Journal:
- Pattern recognition
- Issue:
- Volume 140(2023)
- Issue Display:
- Volume 140, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 140
- Issue:
- 2023
- Issue Sort Value:
- 2023-0140-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-08
- Subjects:
- Saliency detection -- RGB-D image -- Transformer -- Attention
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2023.109516 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 27043.xml