CVSep 25, 2023

BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation

Uyoung Jeong, Seungryul Baek, Hyung Jin Chang, Kwang In Kim

arXiv:2309.14072v23.92 citationsh-index: 32Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of accurate pose estimation in crowded environments for computer vision applications, representing an incremental advance over existing methods.

The paper tackles the problem of disentangling features by individual instances in crowded scenes for multi-person pose estimation, proposing BoIR which achieves state-of-the-art performance with improvements of 0.8 AP on COCO val, 0.5 AP on COCO test-dev, 4.9 AP on CrowdPose, and 3.5 AP on OCHuman.

Single-stage multi-person human pose estimation (MPPE) methods have shown great performance improvements, but existing methods fail to disentangle features by individual instances under crowded scenes. In this paper, we propose a bounding box-level instance representation learning called BoIR, which simultaneously solves instance detection, instance disentanglement, and instance-keypoint association problems. Our new instance embedding loss provides a learning signal on the entire area of the image with bounding box annotations, achieving globally consistent and disentangled instance representation. Our method exploits multi-task learning of bottom-up keypoint estimation, bounding box regression, and contrastive instance embedding learning, without additional computational cost during inference. BoIR is effective for crowded scenes, outperforming state-of-the-art on COCO val (0.8 AP), COCO test-dev (0.5 AP), CrowdPose (4.9 AP), and OCHuman (3.5 AP). Code will be available at https://github.com/uyoung-jeong/BoIR

View on arXiv PDF Code

Similar