Bi-directional Graph Structure Information Model for Multi-Person Pose Estimation
This work addresses pose estimation for multiple people in images, offering an incremental improvement over existing methods.
The paper tackles multi-person pose estimation by proposing a network with two branches: one for joint confidence maps with geometric propagation, and another using a bi-directional graph model to encode context and infer occlusions. It achieves an average precision of 62.9 on COCO and 77.6 on MPII datasets, showing competitive results without extra training.
In this paper, we propose a novel multi-stage network architecture with two branches in each stage to estimate multi-person poses in images. The first branch predicts the confidence maps of joints and uses a geometrical transform kernel to propagate information between neighboring joints at the confidence level. The second branch proposes a bi-directional graph structure information model (BGSIM) to encode rich contextual information and to infer the occlusion relationship among different joints. We dynamically determine the joint point with highest response of the confidence maps as base point of passing message in BGSIM. Based on the proposed network structure, we achieve an average precision of 62.9 on the COCO Keypoint Challenge dataset and 77.6 on the MPII (multi-person) dataset. Compared with other state-of-art methods, our method can achieve highly promising results on our selected multi-person dataset without extra training.