View Consistency Aware Holistic Triangulation for 3D Human Pose Estimation
This work addresses multi-view 3D human pose estimation for computer vision applications, representing an incremental improvement with novel components for specific bottlenecks.
The paper tackles the challenges of 2D detection outliers and 3D implausible poses in multi-view 3D human pose estimation by introducing a Multi-View Fusion module and Holistic Triangulation with anatomy prior, resulting in state-of-the-art performance in precision and plausibility as assessed by a new metric.
The rapid development of multi-view 3D human pose estimation (HPE) is attributed to the maturation of monocular 2D HPE and the geometry of 3D reconstruction. However, 2D detection outliers in occluded views due to neglect of view consistency, and 3D implausible poses due to lack of pose coherence, remain challenges. To solve this, we introduce a Multi-View Fusion module to refine 2D results by establishing view correlations. Then, Holistic Triangulation is proposed to infer the whole pose as an entirety, and anatomy prior is injected to maintain the pose coherence and improve the plausibility. Anatomy prior is extracted by PCA whose input is skeletal structure features, which can factor out global context and joint-by-joint relationship from abstract to concrete. Benefiting from the closed-form solution, the whole framework is trained end-to-end. Our method outperforms the state of the art in both precision and plausibility which is assessed by a new metric.