CVApr 26, 2019

Unifying Part Detection and Association for Recurrent Multi-Person Pose Estimation

arXiv:1904.11864v1
Originality Incremental advance
AI Analysis

This addresses the problem of accurately estimating poses for multiple people in images, particularly in challenging conditions like occlusions, though it appears incremental as it builds on existing two-stage and RNN approaches.

The paper tackles multi-person pose estimation by proposing a joint model that unifies joint detection and association in an end-to-end framework, eliminating heuristic assumptions. On the MSCOCO dataset, it achieved improvements over baselines, especially in occluded scenes.

We propose a joint model of human joint detection and association for 2D multi-person pose estimation (MPPE). The approach unifies training of joint detection and association without a need for further processing or sophisticated heuristics in order to associate the joints with people individually. The approach consists of two stages, where in the first stage joint detection heatmaps and association features are extracted, and in the second stage, whose input are the extracted features of the first stage, we introduce a recurrent neural network (RNN) which predicts the heatmaps of a single person's joints in each iteration. In addition, the network learns a stopping criterion in order to halt once it has identified all individuals in the image. This approach allowed us to eliminate several heuristic assumptions and parameters needed for association which do not necessarily hold true. Additionally, such an end-to-end approach allows the final objective to be known and directly optimized over during training. We evaluated our model on the challenging MSCOCO dataset and obtained an improvement over the baseline, particularly in challenging scenes with occlusions.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes