Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos. (April 2018)
- Record Type:
- Journal Article
- Title:
- Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos. (April 2018)
- Main Title:
- Region-sequence based six-stream CNN features for general and fine-grained human action recognition in videos
- Authors:
- Ma, Miao
Marturi, Naresh
Li, Yibin
Leonardis, Ales
Stolkin, Rustam - Abstract:
- Highlights: We proposed a new method for human fine-grained action recognition in videos. Our method uses a coarse pose estimation method to cut video frames and get human body foreground patch sequence. Our method focuses on human lower arm area to enhance effective pixels for fine-grained actions. We propose an encoding method to process the last pooling layer features of CNN structure. Abstract: This paper addresses the problems of both general and also fine-grained human action recognition in video sequences. Compared with general human actions, fine-grained action information is more difficult to detect and occupies relatively small-scale image regions. Our work seeks to improve fine-grained action discrimination, while also retaining the ability to perform general action recognition. Our method first estimates human pose and human parts positions in video sequences by extending our recent work on human pose tracking, and crops different scaled patches to obtain richer action information in a variety of different scales of appearance and motion cues. We then utilize a Convolutional Neural Network (CNN) to process each such image patch. Instead of using the output one dimension feature from the full-connection layer, we utilize the outputs of the pooling layer of CNN structure, which contains more spatial information. Then the high dimension of the pooling features is reduced by encoding, to generate the final human action descriptors for classification. Our methodHighlights: We proposed a new method for human fine-grained action recognition in videos. Our method uses a coarse pose estimation method to cut video frames and get human body foreground patch sequence. Our method focuses on human lower arm area to enhance effective pixels for fine-grained actions. We propose an encoding method to process the last pooling layer features of CNN structure. Abstract: This paper addresses the problems of both general and also fine-grained human action recognition in video sequences. Compared with general human actions, fine-grained action information is more difficult to detect and occupies relatively small-scale image regions. Our work seeks to improve fine-grained action discrimination, while also retaining the ability to perform general action recognition. Our method first estimates human pose and human parts positions in video sequences by extending our recent work on human pose tracking, and crops different scaled patches to obtain richer action information in a variety of different scales of appearance and motion cues. We then utilize a Convolutional Neural Network (CNN) to process each such image patch. Instead of using the output one dimension feature from the full-connection layer, we utilize the outputs of the pooling layer of CNN structure, which contains more spatial information. Then the high dimension of the pooling features is reduced by encoding, to generate the final human action descriptors for classification. Our method reduces feature dimension while also effectively combining appearance and motion information in a unified framework. We have carried out empirical experiments using two publicly available human action datasets, comparing the human action recognition result of our algorithm against six recent state-of-the-art methods from the literature. The results suggest comparatively strong performance of our method. … (more)
- Is Part Of:
- Pattern recognition. Volume 76(2018:Apr.)
- Journal:
- Pattern recognition
- Issue:
- Volume 76(2018:Apr.)
- Issue Display:
- Volume 76 (2018)
- Year:
- 2018
- Volume:
- 76
- Issue Sort Value:
- 2018-0076-0000-0000
- Page Start:
- 506
- Page End:
- 521
- Publication Date:
- 2018-04
- Subjects:
- Human pose -- Action recognition -- Video understanding
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2017.11.026 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 11368.xml