UAVformer: A Composite Transformer Network for Urban Scene Segmentation of UAV Images. (January 2023)
- Record Type:
- Journal Article
- Title:
- UAVformer: A Composite Transformer Network for Urban Scene Segmentation of UAV Images. (January 2023)
- Main Title:
- UAVformer: A Composite Transformer Network for Urban Scene Segmentation of UAV Images
- Authors:
- Yi, Shi
Liu, Xi
Li, Junjie
Chen, Ling - Abstract:
- Highlights: A novel transformer-based semantic segmentation network with a composite structure backbone is proposed for urban scene segmentation of UAV images. Adaptive fusion modules (AFM) are implemented to adaptively fuse the multi-level extracted features. An aggregation window multi-head self-attention (AWMSA) mechanism is designed in the transformer block for accurately segmented scale variation objects in UAV images. A V-shaped decoder with the capacity to fully utilise multi-level features is proposed to ideally preserve segmented object boundaries. Abstract: Urban scenes segmentation based on UAV (Unmanned aerial vehicle) view is a fundamental task for the applications of smart city such as city planning, land use monitoring, traffic monitoring, and crowd estimation. While urban scenes in UAV image characteristic by large scale variation of objects size and complexity background, which posed challenges to urban scenes segmentation of UAV image. The feature extracting backbone of existing networks cannot extract complex features of UAV image effectively, which limits the performance of urban scenes segmentation. To design segmentation network capable of extracting features of large scale variation urban ground scenes, this study proposed a novel composite transformer network for urban scenes segmentation of UAV image. A composite backbone with aggregation windows multi-head self-attention transformer blocks is proposed to make the extracted features moreHighlights: A novel transformer-based semantic segmentation network with a composite structure backbone is proposed for urban scene segmentation of UAV images. Adaptive fusion modules (AFM) are implemented to adaptively fuse the multi-level extracted features. An aggregation window multi-head self-attention (AWMSA) mechanism is designed in the transformer block for accurately segmented scale variation objects in UAV images. A V-shaped decoder with the capacity to fully utilise multi-level features is proposed to ideally preserve segmented object boundaries. Abstract: Urban scenes segmentation based on UAV (Unmanned aerial vehicle) view is a fundamental task for the applications of smart city such as city planning, land use monitoring, traffic monitoring, and crowd estimation. While urban scenes in UAV image characteristic by large scale variation of objects size and complexity background, which posed challenges to urban scenes segmentation of UAV image. The feature extracting backbone of existing networks cannot extract complex features of UAV image effectively, which limits the performance of urban scenes segmentation. To design segmentation network capable of extracting features of large scale variation urban ground scenes, this study proposed a novel composite transformer network for urban scenes segmentation of UAV image. A composite backbone with aggregation windows multi-head self-attention transformer blocks is proposed to make the extracted features more representatives by adaptive multi-level features fusion, and the full utilisation of contextual information and local information. Position attention modules are inserted in each stage between encoder and decoder to further enhance the spatial attention of extracted feature maps. Finally, a V-shaped decoder which is capable of utilising multi-level features is designed to get accurately dense prediction. The accuracy of urban scenes segmentation could significantly be enhanced in this way and successfully segmented the large scale variation objects from UAV views. Extensive ablation experiments and comparative experiments for the proposed network have been conducted on the public available urban scenes segmentation datasets for UAV imagery. Experimental results have demonstrated the effectiveness of designed network structure and the superiority of proposed network over state-of-the-art methods. Specifically, reached 53.2% mIoU on the UAVid dataset and 77.6% mIoU on the UDD6 dataset, respectively. … (more)
- Is Part Of:
- Pattern recognition. Volume 133(2023)
- Journal:
- Pattern recognition
- Issue:
- Volume 133(2023)
- Issue Display:
- Volume 133, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 133
- Issue:
- 2023
- Issue Sort Value:
- 2023-0133-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-01
- Subjects:
- Urban scenes segmentation -- UAV image -- Composite backbone -- Aggregation windows multi-head self-attention transformer block -- V-shaped decoder
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2022.109019 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 24024.xml