Adaptive enhanced swin transformer with U-net for remote sensing image segmentation. (September 2022)
- Record Type:
- Journal Article
- Title:
- Adaptive enhanced swin transformer with U-net for remote sensing image segmentation. (September 2022)
- Main Title:
- Adaptive enhanced swin transformer with U-net for remote sensing image segmentation
- Authors:
- Gu, Xingjian
Li, Sizhe
Ren, Shougang
Zheng, Hengbiao
Fan, Chengcheng
Xu, Huanliang - Abstract:
- Highlights: In this paper, a UAV remote sensing segmentation method based on CNN and transformer is proposed. On the basis of the U-Net structure, we introduce a CNN transformer hybrid encoder and a symmetrical CNN decoder, which can effectively extract and utilize the global and local semantic information for obtaining the segmentation image accurately and reduce the calculation of transformer. Besides, we construct an adaptive multiscale transformer module and strengthen the multi-head self-attention in it for boosting the performance of AESwin-UNet. Experimental results on two UAV remote sensing datasets show that our AESwin-UNet has excellent performance. Our contributions can be summarized as:. Based on hybrid CNN-Transformer, a U-shaped encoder-decoder model with skip connections is proposed, which realizes pixel-level segmentation prediction by fusing local and global feature, while reducing the scale of pre-training. An enhanced swin transformer block with an attention module is constructed, which enhances the extraction of the effective features by reducing the redundancy in MHSA. A deformable adaptive patch merging layer is proposed to assign appropriate receptive fields to different targets while achieving down-sampling. Abstract: Semantic segmentation of remote sensing images often faces complex situations, such as variable scale objects, large intra-class differences, and imbalanced distribution among classes. Convolutional Neural Network (CNN) based models haveHighlights: In this paper, a UAV remote sensing segmentation method based on CNN and transformer is proposed. On the basis of the U-Net structure, we introduce a CNN transformer hybrid encoder and a symmetrical CNN decoder, which can effectively extract and utilize the global and local semantic information for obtaining the segmentation image accurately and reduce the calculation of transformer. Besides, we construct an adaptive multiscale transformer module and strengthen the multi-head self-attention in it for boosting the performance of AESwin-UNet. Experimental results on two UAV remote sensing datasets show that our AESwin-UNet has excellent performance. Our contributions can be summarized as:. Based on hybrid CNN-Transformer, a U-shaped encoder-decoder model with skip connections is proposed, which realizes pixel-level segmentation prediction by fusing local and global feature, while reducing the scale of pre-training. An enhanced swin transformer block with an attention module is constructed, which enhances the extraction of the effective features by reducing the redundancy in MHSA. A deformable adaptive patch merging layer is proposed to assign appropriate receptive fields to different targets while achieving down-sampling. Abstract: Semantic segmentation of remote sensing images often faces complex situations, such as variable scale objects, large intra-class differences, and imbalanced distribution among classes. Convolutional Neural Network (CNN) based models have been widely used in remote sensing image segmentation tasks for its powerful feature extraction capability. Due to intrinsic locality of CNN architectures, it is difficult to understand the long-range dependencies among image patches. Recently, the transformer leverages long-range dependencies and performs well in computer vision tasks. To take advantages of both CNN and Transformer, a novel Adaptive Enhanced Swin Transformer with U-Net (AESwin-UNet) is proposed for remote sensing segmentation. AESwin-UNet uses a hybrid Transformer-based U-type Encoder-Decoder architecture with skip connections to extract local and global semantic features. Specifically, the Enhanced Swin Transformer (E-Swin Transformer) contains Enhanced Multi-head Self-Attention and Deformable Adaptive Patch Merging layer in encoder. A symmetric cascaded decoder is designed for up-sampling to obtain higher resolution feature maps. Experiments on two public benchmark datasets, WHDLD and LoveDA, demonstrate that the proposed AESwin-UNet performs well in semantic segmentation. … (more)
- Is Part Of:
- Computers & electrical engineering. Volume 102(2022)
- Journal:
- Computers & electrical engineering
- Issue:
- Volume 102(2022)
- Issue Display:
- Volume 102, Issue 2022 (2022)
- Year:
- 2022
- Volume:
- 102
- Issue:
- 2022
- Issue Sort Value:
- 2022-0102-2022-0000
- Page Start:
- Page End:
- Publication Date:
- 2022-09
- Subjects:
- Remote sensing -- Semantic segmentation -- Unet -- Transformer -- CNN
Computer engineering -- Periodicals
Electrical engineering -- Periodicals
Electrical engineering -- Data processing -- Periodicals
Ordinateurs -- Conception et construction -- Périodiques
Électrotechnique -- Périodiques
Électrotechnique -- Informatique -- Périodiques
Computer engineering
Electrical engineering
Electrical engineering -- Data processing
Periodicals
Electronic journals
621.302854 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00457906/ ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.compeleceng.2022.108223 ↗
- Languages:
- English
- ISSNs:
- 0045-7906
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.680000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 23294.xml