CVSep 13, 2021

Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images

arXiv:2109.05885v170 citations
Originality Incremental advance
AI Analysis

This addresses the problem of accurately estimating 3D poses for multiple people in multi-view settings, which is important for applications like surveillance and sports analysis, but it is incremental as it builds on the top-down paradigm with task-specific graph networks.

The paper tackles 3D multi-person pose estimation from multi-view images by decomposing it into localization and pose estimation stages, using three graph neural networks for message passing, achieving state-of-the-art performance on CMU Panoptic and Shelf datasets with lower computation complexity.

This paper studies the task of estimating the 3D human poses of multiple persons from multiple calibrated camera views. Following the top-down paradigm, we decompose the task into two stages, i.e. person localization and pose estimation. Both stages are processed in coarse-to-fine manners. And we propose three task-specific graph neural networks for effective message passing. For 3D person localization, we first use Multi-view Matching Graph Module (MMG) to learn the cross-view association and recover coarse human proposals. The Center Refinement Graph Module (CRG) further refines the results via flexible point-based prediction. For 3D pose estimation, the Pose Regression Graph Module (PRG) learns both the multi-view geometry and structural relations between human joints. Our approach achieves state-of-the-art performance on CMU Panoptic and Shelf datasets with significantly lower computation complexity.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes