Detecting and grouping keypoints for multi-person pose estimation using instance-aware attention. (April 2023)
- Record Type:
- Journal Article
- Title:
- Detecting and grouping keypoints for multi-person pose estimation using instance-aware attention. (April 2023)
- Main Title:
- Detecting and grouping keypoints for multi-person pose estimation using instance-aware attention
- Authors:
- Yang, Sen
Feng, Ze
Wang, Zhicheng
Li, Yanjie
Zhang, Shoukui
Quan, Zhibin
Xia, Shu-tao
Yang, Wankou - Abstract:
- Highlights: Exploiting the pairwise attention scores between keypoints as the criterion to judge whether they belong to the same person or not. Using the instance masks to supervise the self-attention to ensure the instance- discriminative characteristics for the use of keypoint grouping. Using a very simple architecture design to simultaneously detect and group instance-agnostic keypoints into person skeletons. The instance segmentation results of any number of people can be directly and simply obtained from the supervised attention matrix. Abstract: Bottom-up human pose estimation models detect keypoints and learn associative information between keypoints, usually requiring human predefined offset fields or embeddings for keypoints grouping (clustering). In this paper, we present a brand new method that can entirely solve these problems based on Transformer, making the grouping process free of the human-defined associative signals. Specifically, the self-attention in vision Transformer measures feature similarity between any pair of locations, which provides a metric space to associate keypoints together into corresponding human instances. However, the naive attention patterns formed in Transformer are still not subjectively controlled, so there is no guarantee that the keypoints only attend to the instances to which they belong. To address it we propose a novel approach of supervising self-attention to be instance-aware, simultaneously accomplishing multi-person keypointHighlights: Exploiting the pairwise attention scores between keypoints as the criterion to judge whether they belong to the same person or not. Using the instance masks to supervise the self-attention to ensure the instance- discriminative characteristics for the use of keypoint grouping. Using a very simple architecture design to simultaneously detect and group instance-agnostic keypoints into person skeletons. The instance segmentation results of any number of people can be directly and simply obtained from the supervised attention matrix. Abstract: Bottom-up human pose estimation models detect keypoints and learn associative information between keypoints, usually requiring human predefined offset fields or embeddings for keypoints grouping (clustering). In this paper, we present a brand new method that can entirely solve these problems based on Transformer, making the grouping process free of the human-defined associative signals. Specifically, the self-attention in vision Transformer measures feature similarity between any pair of locations, which provides a metric space to associate keypoints together into corresponding human instances. However, the naive attention patterns formed in Transformer are still not subjectively controlled, so there is no guarantee that the keypoints only attend to the instances to which they belong. To address it we propose a novel approach of supervising self-attention to be instance-aware, simultaneously accomplishing multi-person keypoint detection and clustering. By doing so, we can group the detected keypoints to their corresponding instances, according to the pairwise attention scores. An additional benefit of our method is that the instance segmentation results of any number of people can be directly obtained from the supervised attention matrix, thereby simplifying the pixel assignment pipeline. The qualitative and quantitative results on the COCO shows that, with a very simple architecture design, our method can achieve comparable performance against the CNN-based bottom-up counterparts with fewer parameters, which also demonstrate a promising way to control self-attention mechanism behavior for specific purposes. … (more)
- Is Part Of:
- Pattern recognition. Volume 136(2023)
- Journal:
- Pattern recognition
- Issue:
- Volume 136(2023)
- Issue Display:
- Volume 136, Issue 2023 (2023)
- Year:
- 2023
- Volume:
- 136
- Issue:
- 2023
- Issue Sort Value:
- 2023-0136-2023-0000
- Page Start:
- Page End:
- Publication Date:
- 2023-04
- Subjects:
- Multi-person human pose estimation -- Self-attention -- Bottom-up -- Transformer -- Grouping -- Keypoints association
Pattern perception -- Periodicals
Perception des structures -- Périodiques
Patroonherkenning
006.4 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00313203 ↗
http://www.sciencedirect.com/ ↗ - DOI:
- 10.1016/j.patcog.2022.109232 ↗
- Languages:
- English
- ISSNs:
- 0031-3203
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 25681.xml