Spatio-temporal deformable 3D ConvNets with attention for action recognition. (February 2020)
- Record Type:
- Journal Article
- Title:
- Spatio-temporal deformable 3D ConvNets with attention for action recognition. (February 2020)
- Main Title:
- Spatio-temporal deformable 3D ConvNets with attention for action recognition
- Authors:
- Li, Jun
Liu, Xianglong
Zhang, Mingyuan
Wang, Deqing - Abstract:
- Highlights: We are the first to propose a spatio-temporal deformable 3D convolutions with an attention mechanism (STDA for short). The proposed module serves as a generic module for many 3D CNNs, and in practice it is only needed to append at the later convolution layer without increasing too much computational cost. Our attention mechanism can exploit both long-range temporal dependencies across multiple frames and long-distance spatial dependencies inside each frame, and thus helps extract the discriminative global information at both inter-frame level and intra-frame level. Experiments validate the superior performances and efficiency of the proposed approach. Abstract: The irregularity of human actions poses great challenges in video action recognition. Recently, 3D ConvNet methods have shown promising performance at modelling the motion and appearance information. However, the fixed geometric structure of 3D convolution filters largely limits the learning capacity for video action recognition. To address this problem, this paper proposes a spatio-temporal deformable ConvNet module with an attention mechanism, which takes into consideration the mutual correlations in both temporal and spatial domains, to effectively capture the long-range and long-distance dependencies in the video actions. Our attention based deformable module, as a generic module for 3D ConvNets, can adaptively learn more accurate spatio-temporal offsets to model the action irregularity. TheHighlights: We are the first to propose a spatio-temporal deformable 3D convolutions with an attention mechanism (STDA for short). The proposed module serves as a generic module for many 3D CNNs, and in practice it is only needed to append at the later convolution layer without increasing too much computational cost. Our attention mechanism can exploit both long-range temporal dependencies across multiple frames and long-distance spatial dependencies inside each frame, and thus helps extract the discriminative global information at both inter-frame level and intra-frame level. Experiments validate the superior performances and efficiency of the proposed approach. Abstract: The irregularity of human actions poses great challenges in video action recognition. Recently, 3D ConvNet methods have shown promising performance at modelling the motion and appearance information. However, the fixed geometric structure of 3D convolution filters largely limits the learning capacity for video action recognition. To address this problem, this paper proposes a spatio-temporal deformable ConvNet module with an attention mechanism, which takes into consideration the mutual correlations in both temporal and spatial domains, to effectively capture the long-range and long-distance dependencies in the video actions. Our attention based deformable module, as a generic module for 3D ConvNets, can adaptively learn more accurate spatio-temporal offsets to model the action irregularity. The experiments on two popular datasets (UCF-101 and HMDB-51) demonstrate that our module significantly outperforms the state-of-the-art methods. … (more)
- Is Part Of:
- Pattern recognition. Volume 98(2020:Feb.)
- Journal:
- Pattern recognition
- Issue:
- Volume 98(2020:Feb.)
- Issue Display:
- Volume 98 (2020)
- Year:
- 2020
- Volume:
- 98
- Issue Sort Value:
- 2020-0098-0000-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-02
- Subjects:
- Action recognition -- Spatio-temporal deformable -- Attention mechanism -- 3D ConvNets
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2019.107037 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 12059.xml