Multi-Stage HRNet: Multiple Stage High-Resolution Network for Human Pose Estimation
This work addresses pose estimation for applications like action recognition, but it appears incremental as it builds on existing top-down pipelines and high-resolution representations.
The paper tackles multi-person pose estimation in images by proposing a Multi-Stage HRNet that refines keypoint positions through multiple stages and cross-stage feature aggregation, achieving a 77.1 AP score on the COCO test-dev dataset.
Human pose estimation are of importance for visual understanding tasks such as action recognition and human-computer interaction. In this work, we present a Multiple Stage High-Resolution Network (Multi-Stage HRNet) to tackling the problem of multi-person pose estimation in images. Specifically, we follow the top-down pipelines and high-resolution representations are maintained during single-person pose estimation. In addition, multiple stage network and cross stage feature aggregation are adopted to further refine the keypoint position. The resulting approach achieves promising results in COCO datasets. Our single-model-single-scale test configuration obtains 77.1 AP score in test-dev using publicly available training data.