Three-stream fusion network for first-person interaction recognition. (July 2020)
- Record Type:
- Journal Article
- Title:
- Three-stream fusion network for first-person interaction recognition. (July 2020)
- Main Title:
- Three-stream fusion network for first-person interaction recognition
- Authors:
- Kim, Ye-Ji
Lee, Dong-Gyu
Lee, Seong-Whan - Abstract:
- Highlights: We propose a novel three-stream framework for first-person interaction recognition. The three-stream correlation fusion can consider correlations between a target and a camera wearer. Superior performance by using the proposed three-stream architecture with the fusion method. Abstract: First-person interaction recognition is a challenging task because of unstable video conditions resulting from the camera wearer's movement. For human interaction recognition from a first-person viewpoint, this paper proposes a three-stream fusion network with two main parts: three-stream architecture and three-stream correlation fusion. The three-stream architecture captures the characteristics of the target appearance, target motion, and camera ego-motion. Meanwhile the three-stream correlation fusion combines the feature map of each of the three streams to consider the correlations among the target appearance, target motion, and camera ego-motion. The fused feature vector is robust to the camera movement and compensates for the noise of the camera ego-motion. Short-term intervals are modeled using the fused feature vector, and a long short-term memory (LSTM) model considers the temporal dynamics of the video. We evaluated the proposed method on two public benchmark datasets to validate the effectiveness of our approach. The experimental results show that the proposed fusion method successfully generated a discriminative feature vector, and our network outperformed all competingHighlights: We propose a novel three-stream framework for first-person interaction recognition. The three-stream correlation fusion can consider correlations between a target and a camera wearer. Superior performance by using the proposed three-stream architecture with the fusion method. Abstract: First-person interaction recognition is a challenging task because of unstable video conditions resulting from the camera wearer's movement. For human interaction recognition from a first-person viewpoint, this paper proposes a three-stream fusion network with two main parts: three-stream architecture and three-stream correlation fusion. The three-stream architecture captures the characteristics of the target appearance, target motion, and camera ego-motion. Meanwhile the three-stream correlation fusion combines the feature map of each of the three streams to consider the correlations among the target appearance, target motion, and camera ego-motion. The fused feature vector is robust to the camera movement and compensates for the noise of the camera ego-motion. Short-term intervals are modeled using the fused feature vector, and a long short-term memory (LSTM) model considers the temporal dynamics of the video. We evaluated the proposed method on two public benchmark datasets to validate the effectiveness of our approach. The experimental results show that the proposed fusion method successfully generated a discriminative feature vector, and our network outperformed all competing activity recognition methods in first-person videos where considerable camera ego-motion occurs. … (more)
- Is Part Of:
- Pattern recognition. Volume 103(2020:Jul.)
- Journal:
- Pattern recognition
- Issue:
- Volume 103(2020:Jul.)
- Issue Display:
- Volume 103 (2020)
- Year:
- 2020
- Volume:
- 103
- Issue Sort Value:
- 2020-0103-0000-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-07
- Subjects:
- First-person vision -- First-person interaction recognition -- Three-stream fusion network -- Three-stream correlation fusion -- Camera ego-motion
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2020.107279 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13507.xml