CVJul 21, 2020

Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View Geometry

arXiv:2007.10986v147 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of accurate 3D pose estimation in dense crowds for applications like surveillance or sports analysis, representing an incremental improvement over prior methods.

The paper tackles the problem of multi-person 3D pose estimation in crowded scenes, where existing methods based on epipolar constraints often fail due to joint mismatches and lack of robustness; the proposed method reformulates it as crowd pose estimation using a graph model and MAP estimator, achieving effectiveness and superiority on four benchmark datasets.

Epipolar constraints are at the core of feature matching and depth estimation in current multi-person multi-camera 3D human pose estimation methods. Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances mainly due to two sources of ambiguity. The first is the mismatch of human joints resulting from the simple cues provided by the Euclidean distances between joints and epipolar lines. The second is the lack of robustness from the naive formulation of the problem as a least squares minimization. In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation. Our method consists of two key components: a graph model for fast cross-view matching, and a maximum a posteriori (MAP) estimator for the reconstruction of the 3D human poses. We demonstrate the effectiveness and superiority of our proposed method on four benchmark datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes