CVSep 25, 2023

BoIR: Box-Supervised Instance Representation for Multi-Person Pose Estimation

arXiv:2309.14072v22 citationsh-index: 32Has Code
Originality Incremental advance
AI Analysis

This addresses the challenge of accurate pose estimation in crowded environments for computer vision applications, representing an incremental advance over existing methods.

The paper tackles the problem of disentangling features by individual instances in crowded scenes for multi-person pose estimation, proposing BoIR which achieves state-of-the-art performance with improvements of 0.8 AP on COCO val, 0.5 AP on COCO test-dev, 4.9 AP on CrowdPose, and 3.5 AP on OCHuman.

Single-stage multi-person human pose estimation (MPPE) methods have shown great performance improvements, but existing methods fail to disentangle features by individual instances under crowded scenes. In this paper, we propose a bounding box-level instance representation learning called BoIR, which simultaneously solves instance detection, instance disentanglement, and instance-keypoint association problems. Our new instance embedding loss provides a learning signal on the entire area of the image with bounding box annotations, achieving globally consistent and disentangled instance representation. Our method exploits multi-task learning of bottom-up keypoint estimation, bounding box regression, and contrastive instance embedding learning, without additional computational cost during inference. BoIR is effective for crowded scenes, outperforming state-of-the-art on COCO val (0.8 AP), COCO test-dev (0.5 AP), CrowdPose (4.9 AP), and OCHuman (3.5 AP). Code will be available at https://github.com/uyoung-jeong/BoIR

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes