MPL: Lifting 3D Human Pose from Multi-view 2D Poses
This addresses the challenge of generalizing 3D pose estimation to real-world scenarios by leveraging synthetic data, but it is incremental as it builds on existing multi-view and lifting approaches.
The paper tackles the problem of estimating 3D human poses from 2D images by proposing a method that combines 2D pose estimation with a transformer-based network for 2D-to-3D lifting, achieving up to a 45% reduction in MPJPE errors compared to triangulation.
Estimating 3D human poses from 2D images is challenging due to occlusions and projective acquisition. Learning-based approaches have been largely studied to address this challenge, both in single and multi-view setups. These solutions however fail to generalize to real-world cases due to the lack of (multi-view) 'in-the-wild' images paired with 3D poses for training. For this reason, we propose combining 2D pose estimation, for which large and rich training datasets exist, and 2D-to-3D pose lifting, using a transformer-based network that can be trained from synthetic 2D-3D pose pairs. Our experiments demonstrate decreases up to 45% in MPJPE errors compared to the 3D pose obtained by triangulating the 2D poses. The framework's source code is available at https://github.com/aghasemzadeh/OpenMPL .