CVSep 8, 2021

Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation

arXiv:2109.03622v241 citations
AI Analysis

This work addresses multi-person pose estimation for computer vision applications, representing an incremental improvement over existing bottom-up methods.

The paper tackles multi-person pose estimation by proposing LOGO-CAP, which learns local-global contextual adaptation to improve accuracy, achieving state-of-the-art performance on the COCO benchmark and outperforming prior methods on the OCHuman dataset.

This paper studies the problem of multi-person pose estimation in a bottom-up fashion. With a new and strong observation that the localization issue of the center-offset formulation can be remedied in a local-window search scheme in an ideal situation, we propose a multi-person pose estimation approach, dubbed as LOGO-CAP, by learning the LOcal-GlObal Contextual Adaptation for human Pose. Specifically, our approach learns the keypoint attraction maps (KAMs) from the local keypoints expansion maps (KEMs) in small local windows in the first step, which are subsequently treated as dynamic convolutional kernels on the keypoints-focused global heatmaps for contextual adaptation, achieving accurate multi-person pose estimation. Our method is end-to-end trainable with near real-time inference speed in a single forward pass, obtaining state-of-the-art performance on the COCO keypoint benchmark for bottom-up human pose estimation. With the COCO trained model, our method also outperforms prior arts by a large margin on the challenging OCHuman dataset.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes