CVFeb 28, 2024

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

arXiv:2402.18330v115 citationsh-index: 3Has CodeCVPR
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate 3D pose estimation for applications like virtual reality or robotics, but it is incremental as it builds on existing heatmap-based approaches.

The paper tackles the problem of inaccurate 3D pose estimation from egocentric camera views due to self-occlusion and out-of-view limbs, achieving a 23.9% reduction in MPJPE error compared to previous state-of-the-art methods.

We present EgoTAP, a heatmap-to-3D pose lifting method for highly accurate stereo egocentric 3D pose estimation. Severe self-occlusion and out-of-view limbs in egocentric camera views make accurate pose estimation a challenging problem. To address the challenge, prior methods employ joint heatmaps-probabilistic 2D representations of the body pose, but heatmap-to-3D pose conversion still remains an inaccurate process. We propose a novel heatmap-to-3D lifting method composed of the Grid ViT Encoder and the Propagation Network. The Grid ViT Encoder summarizes joint heatmaps into effective feature embedding using self-attention. Then, the Propagation Network estimates the 3D pose by utilizing skeletal information to better estimate the position of obscure joints. Our method significantly outperforms the previous state-of-the-art qualitatively and quantitatively demonstrated by a 23.9\% reduction of error in an MPJPE metric. Our source code is available in GitHub.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes