Video semantic segmentation via feature propagation with holistic attention. (August 2020)
- Record Type:
- Journal Article
- Title:
- Video semantic segmentation via feature propagation with holistic attention. (August 2020)
- Main Title:
- Video semantic segmentation via feature propagation with holistic attention
- Authors:
- Wu, Junrong
Wen, Zongzheng
Zhao, Sanyuan
Huang, Kele - Abstract:
- Highlights: Propose a Light, Efficient and Real-time network (denoted as LERNet) as a strong backbone network for per-frame processing. Efficient feature propagation across redundant video frames with key frame selection scheduling. Use temporal holistic attention to imply spatial correlations between key frames and non-key frames. Achieve a speed of 131 fps on the CityScapes dataset. Abstract: Since the frames of a video are inherently contiguous, information redundancy is ubiquitous. Unlike previous works densely process each frame of a video, in this paper we present a novel method to focus on efficient feature propagation across frames to tackle the challenging video semantic segmentation task. Firstly, we propose a Light, Efficient and Real-time network (denoted as LERNet) as a strong backbone network for per-frame processing. Then we mine rich features within a key frame and propagate the across-frame consistency information by calculating a temporal holistic attention with the following non-key frame. Each element of the attention matrix represents the global correlation between pixels of a non-key frame and the previous key frame. Concretely, we propose a brand-new attention module to capture the spatial consistency on low-level features along temporal dimension. Then we employ the attention weights as a spatial transition guidance for directly generating high-level features of the current non-key frame from the weighted corresponding key frame. Finally, weHighlights: Propose a Light, Efficient and Real-time network (denoted as LERNet) as a strong backbone network for per-frame processing. Efficient feature propagation across redundant video frames with key frame selection scheduling. Use temporal holistic attention to imply spatial correlations between key frames and non-key frames. Achieve a speed of 131 fps on the CityScapes dataset. Abstract: Since the frames of a video are inherently contiguous, information redundancy is ubiquitous. Unlike previous works densely process each frame of a video, in this paper we present a novel method to focus on efficient feature propagation across frames to tackle the challenging video semantic segmentation task. Firstly, we propose a Light, Efficient and Real-time network (denoted as LERNet) as a strong backbone network for per-frame processing. Then we mine rich features within a key frame and propagate the across-frame consistency information by calculating a temporal holistic attention with the following non-key frame. Each element of the attention matrix represents the global correlation between pixels of a non-key frame and the previous key frame. Concretely, we propose a brand-new attention module to capture the spatial consistency on low-level features along temporal dimension. Then we employ the attention weights as a spatial transition guidance for directly generating high-level features of the current non-key frame from the weighted corresponding key frame. Finally, we efficiently fuse the hierarchical features of the non-key frame and obtain the final segmentation result. Extensive experiments on two popular datasets, i.e. the CityScapes and the CamVid, demonstrate that the proposed approach achieves a remarkable balance between inference speed and accuracy. … (more)
- Is Part Of:
- Pattern recognition. Volume 104(2020:Aug.)
- Journal:
- Pattern recognition
- Issue:
- Volume 104(2020:Aug.)
- Issue Display:
- Volume 104 (2020)
- Year:
- 2020
- Volume:
- 104
- Issue Sort Value:
- 2020-0104-0000-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-08
- Subjects:
- Real-time -- Attention mechanism -- Feature propagation -- Video semantic segmentation
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2020.107268 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13393.xml