CVSep 8, 2021

Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation

Nan Xue, Tianfu Wu, Gui-Song Xia, Liangpei Zhang

arXiv:2109.03622v29.441 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses multi-person pose estimation for computer vision applications, representing an incremental improvement over existing bottom-up methods.

The paper tackles multi-person pose estimation by proposing LOGO-CAP, which learns local-global contextual adaptation to improve accuracy, achieving state-of-the-art performance on the COCO benchmark and outperforming prior methods on the OCHuman dataset.

This paper studies the problem of multi-person pose estimation in a bottom-up fashion. With a new and strong observation that the localization issue of the center-offset formulation can be remedied in a local-window search scheme in an ideal situation, we propose a multi-person pose estimation approach, dubbed as LOGO-CAP, by learning the LOcal-GlObal Contextual Adaptation for human Pose. Specifically, our approach learns the keypoint attraction maps (KAMs) from the local keypoints expansion maps (KEMs) in small local windows in the first step, which are subsequently treated as dynamic convolutional kernels on the keypoints-focused global heatmaps for contextual adaptation, achieving accurate multi-person pose estimation. Our method is end-to-end trainable with near real-time inference speed in a single forward pass, obtaining state-of-the-art performance on the COCO keypoint benchmark for bottom-up human pose estimation. With the COCO trained model, our method also outperforms prior arts by a large margin on the challenging OCHuman dataset.

View on arXiv PDF Code

Similar