Recognizing actions in images by fusing multiple body structure cues. (August 2020)
- Record Type:
- Journal Article
- Title:
- Recognizing actions in images by fusing multiple body structure cues. (August 2020)
- Main Title:
- Recognizing actions in images by fusing multiple body structure cues
- Authors:
- Li, Yang
Li, Kan
Wang, Xinxin - Abstract:
- Highlights: We propose a unified model for recognizing human actions in static images. It explicitly investigates the body structure information as well as integrates the body structure exploration and action classification tasks into a unified model. Moreover, we design a twostep learning technique, where keypoint estimation provides intermediate supervision for learning human action representations. We design two body structure cues, SBPs and LAD, to fully explore the structure information of human bodies from the local and global perspectives. In order to construct body parts with different scales in unconstrained images, we propose a technique to use human keypoint heatmaps to generate scale adaptive SBPs, which extract fine-grained local human features. Moreover, we propose a technique to automatically determine the most discriminative body part of each action category for identifying the ongoing action. In order to extract global hightlevel body structure features, we propose the LAD to model the spatial angle relationship of pairs of human limbs. The LAD is more robust and achieves better performance compared with the distance based skeleton descriptor. We evaluate our model on two challenging image-based action datasets, and the results show that our method achieves the state-of-the-art performance. Abstract: Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements inHighlights: We propose a unified model for recognizing human actions in static images. It explicitly investigates the body structure information as well as integrates the body structure exploration and action classification tasks into a unified model. Moreover, we design a twostep learning technique, where keypoint estimation provides intermediate supervision for learning human action representations. We design two body structure cues, SBPs and LAD, to fully explore the structure information of human bodies from the local and global perspectives. In order to construct body parts with different scales in unconstrained images, we propose a technique to use human keypoint heatmaps to generate scale adaptive SBPs, which extract fine-grained local human features. Moreover, we propose a technique to automatically determine the most discriminative body part of each action category for identifying the ongoing action. In order to extract global hightlevel body structure features, we propose the LAD to model the spatial angle relationship of pairs of human limbs. The LAD is more robust and achieves better performance compared with the distance based skeleton descriptor. We evaluate our model on two challenging image-based action datasets, and the results show that our method achieves the state-of-the-art performance. Abstract: Although Convolutional Neural Networks (CNNs) have made substantial improvements in many computer vision tasks, there remains room for improvements in image-based action recognition due to the limited capability to exploit the body structure information.In this work, we propose a unified deep model to explicitly explore body structure information and fuse multiple body structure cues for robust action recognition in images.In order to fully explore the body structure information, we design the Body Structure Exploration sub-network.It generates two novel body structure cues, Structural Body Parts and Limb Angle Descriptor, which capture structure information of human bodies from the global and local perspectives respectively. And then, we design the Action Classification sub-network to fuse the predictions from multiple body structure cues to obtain precise results. Moreover, we integrate the two sub-networks into a unified model by sharing the bottom convolutional layers, which improves the computational efficiency in both training and testing stages. We comprehensively evaluate our network on the challenging image-based human action datasets, Pascal VOC 2012 Action and Stanford40. Our approach achieves 93.5% and 93.8% mAP respectively, which outperforms all recent approaches in this field. … (more)
- Is Part Of:
- Pattern recognition. Volume 104(2020:Aug.)
- Journal:
- Pattern recognition
- Issue:
- Volume 104(2020:Aug.)
- Issue Display:
- Volume 104 (2020)
- Year:
- 2020
- Volume:
- 104
- Issue Sort Value:
- 2020-0104-0000-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-08
- Subjects:
- Image-based action recognition -- Convolutional neural network -- Body structure cues
00-01 -- 99-00
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2020.107341 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13393.xml